APP下载

Machine Learning Application for Prediction of Sapphire Crystals Defects

2020-05-14YuliaVladimirovnaKlunnikovaMaximVladimirovichAnikeevAlexeyVladimirovichFilimonovRaviKumar

Yulia Vladimirovna Klunnikova|Maxim Vladimirovich Anikeev|Alexey Vladimirovich Filimonov|Ravi Kumar

Abstract—We investigate the impact of different numbers of positive and negative examples on machine learning for sapphire crystals defects prediction.We obtain the models of crystal growth parameters influence on the sapphire crystal growth.For example,these models allow predicting the defects that occur due to local overcooling of crucible walls in the thermal node leading to the accelerated crystal growth.We also develop the prediction models for obtaining the crystal weight,blocks,cracks,bubbles formation,and total defect characteristics.The models were trained on all data sets and later tested for generalization on testing sets,which did not overlap the training set.During training and testing,we find the recall and precision of prediction,and analyze the correlation among the features.The results have shown that the precision of the neural network method for predicting defects formed by local overcooling of the crucible reached 0.94.

1.Introduction

Machine learning methods are becoming increasingly popular in accelerating the design of new materials by predicting material properties.The minimization of various defects in the crystal structure is extremely important for the improvement and development of modern technologies for the artificial sapphire crystal growth.

Sapphire monocrystals find wide applications in microelectronics,optics,and electronic equipment engineering.They can be used as the substrates for integrated circuits with high resistance to radiation and heat combined with low power consumption.The defects formed in the crystal are one of the key factors affecting the properties of the substrates cut from them and the possibility of their applications in microelectronics and optoelectronics.

Theoretical and experimental studies were carried out with crystal growth by the Kyropoulos method[1]-[9].The Kyropoulos method is used for industrial production of sapphire crystals with a diameter of 300 mm and more.S.Kyropoulos has proposed the process,in which the seed is placed into a water-cooled crystal holder and is introduced to the melt in the melting pot.The crystal grows on the seed in the form of a hemisphere and is deepened into the melt.When the growing crystal reaches the pot’s surface,the crystal holder is lifted by several millimeters,and this process is repeated until the crystal has grown.Crystallographic orientation,the density and types of point defects,the density of single dislocations,lengths of block boundaries and their misorientations,the level of residual stresses,the heterogeneity,and the chemical purity define the sapphire monocrystal’s quality.Features of the crystal growing equipment,the total heating time,the degree of overheating,the seeding time,the speed of crystallization,the heater power output,the speed of the power decay,the voltage and its decay ratio,the crystal growth time,and the output of valid crystals(cracks,blocks,bubbles,residual stresses,etc.)are taken into account to predict the resulting quality of crystals[10].Nowadays there is a number of different studies of the properties and crystal structure defects(X-ray,optical methods,atomic force microscopy,and others).However,experiments alone do not make it possible to evaluate how crystal obtaining parameters affect their properties.In order to fulfill the aims of the research,it is necessary to estimate the correlation of numerous components and to find their optimal combinations for the optimization of the industrial crystal growth performance.

2.Machine Learning for Materials Prediction

The combination of experimental studies of the sapphire crystal growth by the Kyropolous method(see Fig.1)and machine learning methods allow finding the dependencies of the defects level in time and can reduce the number of unsuccessful attempts.The quantity and variety of defects in crystals are described by stochastic functions which depend on different technological parameters,such as the design features of the furnace,the temperature in different areas of the furnace and its variation over time,etc.

Fig.1.Sapphire crystals obtained(a)by the Kyropolous method and(b)enlarged sapphire crystals.

Prediction of crystal defects is an important and fundamental problem in materials science.Machine learning for the prediction of materials has been under investigation for several years.Only several works describe the neural network usage for the prediction of crystal properties[11]-[14].Data mining tools allow predicting the future trends and behavior that allows making decisions.The aim of this contribution is the incorporation of the findings into an overall prediction approach for the computational investigation of possible defects formation during the sapphire crystals growth.

Many studies in the defect prediction use techniques which originated from statistics and machine learning[15].Such techniques include logistic regression,support vector machine(SVM),classification trees,neural networks,the naive Bayes algorithm,k-nearest neighbour (kNN)algorithm,CN2 algorithm,induction algorithm,adaptive boosting algorithm(AdaBoost),random forest algorithm,and stochastic gradient descent(SGD)algorithm[16].

The development and application concerning data mining algorithms require the use of powerful software tools.There are many available open source data mining tools like the Waikato environment for knowledge analysis(WEKA),Tanagra,the Konstanz information miner (KNIME),and Orange Canvas.In this paper,we use the Orange Canvas framework for defects prediction in sapphire crystals.Orange Canvas is a comprehensive,component-based framework for machine learning and data mining.It provides a platform for experiment selection,predictive modeling,and recommendation systems.Orange Canvas includes a set of components for data preprocessing,feature scoring and filtering,simulation,model evaluation,and exploration techniques.It is intended for both experienced users and researchers in machine learning,who want to develop and test their algorithm,with the easy-to-use visual programming environment.Orange Canvas allows easy prototyping of new algorithms and experimental procedures.For explorative data analysis,it provides a visual programming framework with an emphasis on interactions and creative combinations of visual components[16]-[25].

We investigate the impact of different numbers of positive and negative examples on machine learning for sapphire crystals defects prediction.We obtain the models of crystal growth parameters influence on sapphire crystal growth.Training data are sent to the appropriate classifier(see Fig.2).Then we compare the results from the classifiers with the real experiment.The main goal of the research is to assess whether the proposed approach may be used on the defects prediction.

Fig.2.Illustration for machine learning workflow.

3.Sapphire Crystals Defects Prediction Results

We present the receiver operating characteristic(ROC) analysis and comparison between the calculation parameters of Orange Canvas (see Fig.3 and Table 1,where AUC is the area under thr ROC curve,CA is the classification accuracy,and F1 is the F-score).These models allow predicting,for example,the defects formed as a result of local overcooling of the crucible walls in the thermal node,leading to the accelerated crystal growth.We also receive the prediction models for obtained thr crystal weight,blocks,crack,bubbles formation,and total defect characteristics.The models obtained for all data sets and they were later used for generalization on a different data set which does not include the data used on the training stage.During training and testing,we find the recall and precision of prediction,and analyze the correlation among the features.The results show that the neural network precision for defects formed as a result of local overcooling of the crucible was 0.94.The neural network determines the current situation as a known state and reproduces its reaction as accurately as possible.The experimental studies of the sapphire crystals growth by the Kyropoulos method describe the dependence of the defects level in time,and neural networks as a machine learning instrument make it possible to derive new dependencies from this data and predict the obtained crystal quality.The precision of SVM and naive Bayes algorithms was 0.857 and 0.801,respectively.The experimental data set has to be extended to specify models,improving the recall and precision for defects prediction.

Data mining is often concerned with the development of predictive models.In order to apply predictive models in practice,they have to be integrated into the decision support systems.The comparison between calculation parameters of Orange Canvas can be applied for the universal expert system development for defects prediction during the sapphire crystals obtaining.The analysis allows the experts to find hidden information in data and improve the efficiency of prediction.The generalized structure of the expert system for defects prediction is presented in Fig.4.These investigations allow us to improve the expert system for defects prediction in sapphire crystals.We demonstrate the robustness and the predictive power of our method by performing the determination of defects.The designed software is a universal tool for studying the influence of the crystal growth parameters on the quality of sapphire crystals.It can be widely used to estimate and predict the defects of growing crystals.

The class diagrams of the expert system for the sapphire defects prediction can be seen in Fig.5.In this diagram,the main classes include:

Fig.3.ROC analysis of calculation parameters in Orange Canvas.

Table 1:Comparison between calculation parameters of Orange Canvas.

Fig.4.Generalized structure of the expert system knowledge base.

-CCriteria—a class that represents a feature,which can be characterized by an object from the subject area.This class includes the name of the attribute and the weight of the attribute,which is necessary for evaluating its effect on the resulting output of the calculation.This class also combines a set of possible values for the criterion of the values contained in the collection of objects of the CFeature class.

-CFeature—a class that represents one particular value from a certain set of permissible values related to a specific attribute.The class allows the user to set the name of the value and its numerical rating,defined in the range from 0 to 1.

-CCriteriaCollection—a class that facilitates work with a variety of features that are available in the developed expert system.This class simplifies the search and selection of criteria in the developed system,their addition and deletion,and also includes internal means of checking features on the correctness.

-CPattern—a class of the solution variant that is presented to the user of the expert system.The class includes a field for describing the solution received by the user after the search.Another attribute characterizing a solution is the range of values defined by the expert,which distinguishes the solution from others.

-CPatternCollection—a class that represents a set of solutions specified by an expert.This class includes solutions,sorted in an ascending order,which speeds up and simplifies the search procedure.This class also allows to test for the correctness of the set of solutions.

-CExpertSystemContainer—a class that encapsulates all static information about features and solutions provided by experts.

-CTuple—a class that contains information about the selection of certain values of attributes.This sample is made by the user of the expert system.The accuracy of the solution,as a rule,increases when more features are set by the user.The CChooser instance is passed as an object of this class to find a solution.

-CChooser—a class designed to search for a solution.In the beginning,a coefficient that is scaled to the interval of[0,1]is determined from the sample transmitted to it.This coefficient is then passed to the CPatternCollection object.The CPatternCollection object determines which of the intervals falls on the number,and then returns the solution.

-CSerializer—a class that allows serializing and deserializing all the data stored in the CExpertSystemContainer.This class is necessary to save the developed expert system to a file on the disk and to load it.

Fig.5.Class diagram of expert system for the sapphire defects prediction.

The algorithms for data collection and analysis are designed to meet the following criteria:Analysis of initial technological data;statistical data processing;modeling of the influence of technological parameters on the quality of crystals;crystal quality prediction according to the initial data;decision making;analysis of reasons of possible deviations;model correction based on newly discovered data.

Selected user interface elements of the expert system are shown in Fig.6.

4.Conclusions

Various machine learning methods are capable of finding optimal conditions for the sapphire monocrystal production with different efficiency.

Fig.6.Expert system software interface.

This research has been conducted as a comparison between data mining tools (the logistic regression,SVM,classification trees,neural networks,naive Bayes algorithm,kNN algorithm,CN2 induction algorithm,AdaBoost,random forest algorithm,and SGD algorithm) for the sapphire crystals defects prediction.We analyzed the fundamental interplay between the availability of materials data and the predictive capability of machine learning models.We studied the various data mining techniques available to predict the crystal defects and to find the best methods for prediction.

The proposed models allow predicting the crystal quality.We obtained an automatic procedure and machine learning method for the fast crystals quality prediction.The industrial application of such methods will heighten the automatization level of production of crystals with the predefined combination of properties that can be important for a particular application in microelectronics and nanoelectronics.Solving these scientific and engineering problems requires the use of information technologies in crystals production on a new level.

In order to make the data mining technique applicable to daily practice,they have to be integrated into the decision support systems.For this,we propose the schema where the predictive modules are developed separately with the Orange Canvas data mining tool,while for the decision support system we suggest to develop the special software.On the base of our study,we propose the software for analyzing the resulting crystals quality,which allows optimizing the process of the crystal growth.

We expect to increase the experimental data in the future,so it will give new opportunities for prediction and increasing its accuracy.We plan to recognize crystal images from the furnace chamber and to forecast influence of the conditions on the crystal quality.