P01 - Predictor - Function
The expert knowledge
Additionally, further knowledge about the correlations - both at a coarse (process level) granularity and fine (level of characteristic values) granularity - is used in the basic operators. These correlations are represented mathematically in the form of adjacency matrices. The matrices describe important correlations for the scaling function as well as for the transfer function. For the purpose of central access, these matrices are also stored in the database.
The introduction of these matrices is justified by the fact that the available data volume, which describes individual interpolation points, is much smaller than originally assumed. Furthermore, the dimensionality to be considered was significantly higher than assumed, which resulted from the concrete determination procedures of the characteristic values from the descriptor determining processes. Thus, the individual dimensions are sparse from the point of data perspective, which is particularly problematic when using data-driven techniques - the so-called sparse data problem. Due to these two circumstances, it is not possible to reduce the high-dimensional space of the characteristic values in a data-driven way or to choose a starting point. This concerns both the extent of the interpolation points with respect to different alloy systems and the variations on the micro-level in high throughput.
Apart from the storage process, each adjacency matrix also has a version number, which makes the correlations adaptable. This can be done, for example, by new, scientifically relevant results from the respective subprojects or by algorithmically feedback correlation analyses as soon as a sufficient database exists. The initial state of the matrices was determined by extensive expert interviews and first coded by means of tertiary logic within a floating-point number, which can be directly refined, as follows: 0 → "no correlation assumed", 0.5→ "correlation assumed possible" and 1.0 → "correlation assumed". A visualization of these matrices is shown in Figure 1.
Due to newly developed basic operators, the combination of database, process specification, and adjacency matrices allows it to make queries to the database, which, for example, output specific material characteristics of a standardized tensile test which correlate on the micro level with a specific characteristic value of a falling ball test from the subproject U04 (mechanical treatment). This functionality forms an important basis for the predictor function and the algorithmic implementation of the hypothesis system described below.
The Hypothesis System
A hypothesis system in combination with a domain-specific language (DSL) has been developed, which allows formulating and evaluate hypotheses about properties of the existing database. The hypotheses consist on the one hand of assumptions and on the other hand of assertions.
The prerequisites are used to define a validity area for the hypotheses. This makes it possible, for example, to apply the assertions following in the hypothesis only to a specific specimen geometry or heat treatment. Apart from that, more complex properties can be described, which refer to information about current correlations (from the adjacency matrices), for which modeling techniques from the field of model checking have been used.
Figure 2 shows the web-based hypothesis editor for formulating new hypotheses that are evaluated in the back-end. Figure 3 shows the supported language constructs of the developed DSL to describe multi-level assumptions as well as assertions that are logically connected. Additionally, the introduced nomenclatures regarding sample and process description as well as the description of a single characteristic value are supported. The common arithmetic operators are available, also in comparison with constants. Besides, more complex characteristics can be calculated on the data series, which for example calculate the Pearson's Correlation Coefficient (PCC) and allow comparing them.
The formulated hypotheses can be validated or falsified by the developed system. In the case of falsification, corresponding counter-examples are shown, i.e. data artifacts for which the outlined assertions are not valid and which therefore refute the hypothesis. A validation implies that the formulated hypothesis is valid for the entire data basis under these assumptions.