APP下载

A machine learning approach for accelerated design of magnesium alloys.Part A: Alloy data and property space

2023-12-27GhorniBoleyNkshimBirilis

Journal of Magnesium and Alloys 2023年10期

M.Ghorni ,M.Boley ,P.N.H.Nkshim ,N.Birilis

a Department of Materials Science and Engineering, Monash University, VIC, 3800, Australia

b Faculty of Information Technology, Monash University, VIC, 3800, Australia

c College of Engineering and Computer Science, Australian National University, ACT, 2601, Australia

d Faculty of Engineering, Science and the Built Environment, Deakin University, VIC, 3125, Australia

Abstract Typically,magnesium alloys have been designed using a so-called hill-climbing approach,with rather incremental advances over the past century.Iterative and incremental alloy design is slow and expensive,but more importantly it does not harness all the data that exists in the field.In this work,a new approach is proposed that utilises data science and provides a detailed understanding of the data that exists in the field of Mg-alloy design to date.In this approach,first a consolidated alloy database that incorporates 916 datapoints was developed from the literature and experimental work.To analyse the characteristics of the database,alloying and thermomechanical processing effects on mechanical properties were explored via composition-process-property matrices.An unsupervised machine learning(ML)method of clustering was also implemented,using unlabelled data,with the aim of revealing potentially useful information for an alloy representation space of low dimensionality.In addition,the alloy database was correlated to thermodynamically stable secondary phases to further understand the relationships between microstructure and mechanical properties.This work not only introduces an invaluable open-source database,but it also provides,for the first-time data,insights that enable future accelerated digital Mg-alloy design.

Keywords: Magnesium;Alloy design;Mg-alloy database;Data analysis;Data visualisation;Unsupervised machine learning.∗Corresponding author.

1.Introduction

Magnesium (Mg) alloys continue to draw a great deal of research attention due to their favourable properties that include low density and high specific strength [1].These properties make them critical to emerging applications that range from transportation to electronic devices,and implant materials [2].An increasing number of research works on Mg alloy design have emerged in the past 15 years,focusing on the improvement of mechanical properties.The Mg alloys studied were processed using different methods including die casting,sand casting,twin-roll casting,and additive manufacturing [3];along with solid-state processing including extrusion,rolling,and equal channel angular extrusion/pressing(ECAE/ECAP) [4].In the development of these new Mg alloys,criteria such as the specific alloying elements used,appropriate melt processing,and thermomechanical or heat treatment routes [5],were all considered as key factors that significantly affect the mechanical properties of the alloys.

Historically,the discovery of new alloys has been through iterative trial-and-error processes including intensive (often expensive and time-consuming) laboratory studies.Research in metals physics and alloy design has continually progressed over the past 70 years by the development of thermodynamic and kinetic theory[6–8],computation [9–11],and simulations such as density functional theory and molecular dynamics[12].Historically,based on theoretical and computational materials science,incremental advances have been made in alloy design-particularly in the determination of structure-property relationships [13].For instance,the theoretical basis of solidstate physics and chemistry (formed by ab initio calculations of quantum theory) has been used to explain the relationship between electronic structure and some alloy properties [14].Other examples include the use of the Hume–Rothery rules in the solubility tendency of metals and its effect on mechanical properties,the Hall–Petch equation for the relationship between grain size and yield strength [15],and the Johnson-Cook type flow stress model [16] to establish the flow stress as a function of strain,strain rate and temperature [17,18].

Along with incremental developments in alloy design,questions have been raised about the empirical approaches that do not harness all the data that exists in the field [19].The rationalisation of the complex and multi-dimensional relationships between the behaviour and characteristics of materials (i.e.,where alloy compositions may include as many as ten alloying elements,with complex multi-step processing conditions)has made further progress difficult to achieve with traditional design methods– making it rational to propose approaches that employ data science and machine learning(ML).These new approaches are based on an understanding of rich data available.Favourable outcomes from modern data science decision making in other fields have also encouraged materials scientists to employ materials informatic methods as novel solutions for extracting knowledge from materials data[15,20-31].

A data-driven learning paradigm is generally classified into two categories,supervised and unsupervised ML [32].While the performance of a predictive supervised ML model depends on the complexity and dimensionality of features/descriptors in the training data,an unsupervised ML technique,and in particular clustering,has the advantage of revealing hidden patterns within complex and high-dimensional data [33].Unsupervised algorithms do not reference any corresponding target properties and focus on extracting knowledge from unlabelled data using multiple interrelated feature analysis for the purposes of clustering and dimension reduction [34–36].In a study in which a clustering and dimension reduction combination of unsupervised ML techniques was used,intrinsic patterns were extracted within highly complex molecular materials data sets.These patterns categorised the chemicals and nanostructures within the clustered data[35].In another study,the K-means clustering method was used to find similarities in 2D metal carbides and nitrides with a high dimensional features vector,that included both elemental features and DFTcalculated features [37].Principal component analysis (PCA)as an unsupervised ML method of dimension reduction was utilised to gain a visual insight into twisted bilayer graphene data [38].This data included a very complex relationship between the twist angle and simulated Raman spectra.In that work,a new smaller representative feature space was introduced that described the characteristics of the data.A number of dimension reduction and clustering methods have been examined on molecular simulation data to analyse and explain the data using a small set of collective variables [39].In ML studies of nanoparticles,the advantages of using unsupervised learning methods including t-SNE(t-distributed stochastic neighbour embedding) and SOMs (self-organisation map)were reviewed.These methods were used to visualise a highly non-linear multi-structure/multi-property relationship in silver and platinum nanoparticle data and to identify the underlying patterns [40].

In addition,the ever-growing volume of digital data as an asset has emerged as the driving force behind the scientific paradigm of data science [12,41-44].Data is the foundation of data science and ML,since comprehensive understanding,insights,and knowledge can be extracted from data.Materials data is notionally complex in nature,and it is an attractive candidate for analysis via data-driven methods to obtain insights that cannot be achieved using empirical approaches for new materials discovery.The most well-known materials ‘big data’ infrastructures that are publicly accessible to date include: Materials Project (for inorganic compounds,molecules,and nano-porous materials)[45],NOMAD(for energy calculations) [46],Citrination (for chemical compounds) [47],AFLOW (for compounds and inorganic crystal structures) [48],NIMS (for microstructures of materials)[49],along with phase and compound libraries that exist in CALPHAD databases [50],etc.[51–55].Most of the existing materials databases have focused on inorganic compounds,structures,and physical (as opposed to mechanical)properties.A few recent works have explored the mechanical properties of metallic materials and have made compositionprocessing-property datasets of multi-principal element alloys[56–58] and aluminium alloys [59] available.However,no studies to date have introduced an open-source and comprehensive database for magnesium alloys.

Thus,in the present work,the aim is to propose a new approach using data science that provides a detailed understanding of the complex data that exists in the field of Mg-alloy design.A consolidated Mg-alloy dataset is introduced,containing chemical composition,thermomechanical processing route,and mechanical property values.The database incorporates 916 unique entries obtained from the literature and from a body of experimental work.The corresponding thermodynamically stable secondary phases for each of the alloys are also calculated (using PANDATR○software) and added to the database.Data science methods of descriptive statistical analysis,pairwise correlation analysis,and data visualisation are employed to analyse the characteristics of the data (such as distributions,trends,and patterns).

To provide an insightful understanding of the complex data (of high-dimensionality),an unsupervised ML technique is implemented that reveals hidden relationships and defines low-dimensional representation spaces,accordingly.The absence of any unsupervised ML study of Mg-alloy data to date,indicates a gap in existing knowledge.To address this gap,and to provide new insights into the field of Mg-alloy design,unsupervised ML,including application of the t-SNE dimension-reduction algorithm combined with a clustering algorithm (BIRCH,which is balanced iterative reducing and clustering using hierarchies) is adopted.Obtained clusters of similar trends and insights amongst the instances can introduce smaller descriptive feature sets that reliably capture important characteristics of the data [33,39].These smaller feature sets can then be examined further to define the most descriptive representation spaces.

In the present study(Part A)of this two-part body of work,the nature of the data concerning Mg-alloys,the property space of Mg-alloys,the relationships within such data,the utility of unsupervised ML,and the prospect for analysis via data science tools to understand the data and also recognise unknown patterns,is reported.In the follow up study(Part B),the database presented herein is explored to define an appropriate description of Mg-alloys as model input for effective supervised ML– allowing users to explore and discover future Mg-alloy candidates.Supervised ML uses labelled data including the target property data,to train a model for the purposes of classification or regression prediction.

2.Methods

2.1.Development of a consolidated database

In order to develop a database for magnesium alloys,data was collected and compiled for the work herein from as many sources as possible.These sources include (i) the archival literature (searchable via Google Scholar),(ii) patents (searchable via Google Patents),(iii) web-sources and databases(MatWeb [60],CES Edupack [61]),and (iv) unpublished data from a body of experimental works conducted by Monash University.The manual collection and compilation of data was carried out via the relevant technical assistance recognised in the acknowledgements section.Examples of key publications as sources include [2,3,62-76].The alloy database includes three key categories,namely: Chemical composition of each alloy;Production route and processing treatment of each alloy;and mechanical properties of each alloy that have been restricted to the yield strength (YS),ultimate tensile strength(UTS),and elongation/ ductility.

2.2.Data pre-processing

The Python data management tools including Pandas,Numpy and Scikit-learn [77] were employed to build a tabular data structure (data-frame) for data presentation,large data handling and pre-processing.There were (inevitable) inconsistencies in collected data with a variety of data formats from disparate data sources that were inconsistent(such as the units employed across studies).Initially,chemical compositions were all converted to a weight percentage,and yield and tensile strength values were reported in megapascal (MPa).Next,data was rescaled using min-max normalisation and minimum shift technique [78] defined as:

whereXis an individual feature value,XminandXmaxare the minimum and maximum of the feature x over the entire dataset,respectively.Xnormis the normalised and rescaled value of featureX.This normalises and translates each feature individually,such that features have a default range (of (0,1))[77] while preserving the shape of the original distribution.Subsequently,the data-frame was investigated to find missing or duplicate values along with semantic errors within the compiled data.For instance,in some published works,higher values of yield strength than ultimate strength values were reported.In some other examples,inaccuracies in the sum of constituent elements of an alloy were observed.It was concluded that this issue arose because the average values of elements were replaced with the ranges of elements in some Mg-alloy groups.These processes resulted in the exclusion of some data.The collated data includes numerical values for composition and a discrete categorical label for thermomechanical processing route.To represent the categorical data in the form of a meaningful numeric vector,a conversion tool was utilised,namely the one-hot-encoding method [79].This produces numeric binary vectors of zero and one and was exploited to transform the labels and to avoid misinterpreting discrete data.

2.3.Data visualisation and analysis

An effective way to process and interpret large data sets is via representations based on data visualisation.The Python visualisation libraries of Matplotlib,Plotly and Seaborn were employed in this study to summarise data in a visual format.Having created data visualisation plots,it was also possible for user interpretation of any outliers,and therefore to allow manual validation of the dataset prior to further interpretation.High quality (large) data sets and tools for pattern recognition have not previously been applied to Mg alloys for data analysis in depth.Frequency distribution histograms,scatterplot matrices (known as SPLOM),correlation heatmap plots,univariate analysis and descriptive statistics [80] were utilised to identify the attributes and main characteristics of the dataset within the exploratory data analysis.Heatmaps are graphical representations of the linear correlation between any two variables.In this work,to calculate these linear correlation coefficients between the constituent elements of magnesium alloys,the Pearson correlation coefficient(PCC)was used that is defined as follows [81].

2.4.Unsupervised machine learning

The so-called “clustering” technique is a major area of interest and a well-established technique within the field of unsupervised ML [32,83,84] and was adopted in the currentwork to analyse unlabelled data.Clustering algorithms are capable of grouping data based on natural similarities and differences -if they exist -in the data,without considering the target variables (which in the present study are mechanical properties).Due to the high dimensionality of the data(i.e.many input variables relative to the number of output variables),the non-linear dimensionality reduction algorithm,t-SNE [77],was utilised.The t-SNE method infers underlying hidden patterns in the data and projects them into a lowdimensional space for improvement of computational complexity and visualisation.The algorithm has tuneable parameters with complex effects on the resulting map of distinctive clusters.To identify and separate clusters based on the similarities of the instances,and to extract useful information,the BIRCH algorithm [77] was combined with the t-SNE algorithm.Clusters with common characteristics may be recognised and visually represented by colour coded maps.

Table 1 Summary of the six categories of production route and processing treatments for the Mg-alloys in the database compiled for this work.

2.5.Thermodynamic computations

To bridge a physics-related representation to the input variables that affect the final mechanical properties of an alloy,thermodynamic calculations of phase analysis were employed herein.For this purpose,PANDATTMwas utilised to calculate the equilibrium weight fraction of stable second phases for each alloy(by fraction and type).Using the thermodynamically calculated (i.e.,CALPHAD) result of the alloy phase constitution,any correlation with the mechanical properties could also be explored.

3.Results and discussion

3.1.Database

The compiled and formatted database is publicly accessible through the following link: https://github.com/katrina-coder/Magnesium-alloys-database.Included in the database are chemical composition,thermomechanical processing conditions and mechanical property values of magnesium alloys.

The alloys explored in this study incorporate at least one or more of 30 alloying elements,including Zn,Al,Mn,Zr,Nd,Ce,La,Y,Cu,Si,Gd,Ca,Pr,Ni,Fe,Li,Be,Sn,Th,Ag,Sb,Er,Dy,Yb,Bi,Sr,Ga,Sc,Tb,and Ho.In addition,the production routes and processing treatments for the alloys are comprised of a range of casting or thermomechanical processes (including heat treatments).In order to classify the alloys in a rational manner,these different production routes were expertly (i.e.,human expert) encoded into one of six(6) mutually exclusive categories.The six categories of processing designation that capture the alloys in the database are summarised in Table 1.

3.2.Analysis of the database

3.2.1.Statistical analysis

A summary of the processed data-frame of 916 Mg alloys is shown in Tables 2 and 3.The elements in the alloy compositions explored,along with their range (wt.%),mean,standard deviation,and abundance in the database are summarised in Table 2.

Amongst the thirty alloying elements that have been utilised in the Mg alloys in the database,zinc (Zn),aluminium (Al),and manganese (Mn) are the most frequently used.Both Al and Zn were reported as key alloying elements for Mg as early as Griesheim-Elektron in 1909 [85].Owing to some solubility in Mg,the rare earth (RE) metals have recently drawn significant attention in the field of Mg alloys,including neodymium (Nd),cerium (Ce),lanthanum (La) and yttrium (Y),which are the most common RE elements in Mg alloys.By contrast,approximately one third of alloying elements are infrequently employed in Mg-alloys.For instance,Terbium(Tb)and holmium(Ho)are present in only two magnesium alloys.It was also observed that gadolinium(Gd)with the maximum amount of 25.6 wt.% had the highest single alloying content that has been added to a magnesium alloy.Although most of the magnesium alloys do not include Gd,this element exists in 12% of the alloys,with the mean and median values of 9.38 wt.% and 10 wt.%,respectively.Apartfrom iron (Fe) that principally originates as an impurity,antimony (Sb) and Ho have the lowest concentrations of 1 wt.%and 1.4 wt% respectively,along with Beryllium (Be) that is uniquely and sparingly added (0.0008 wt.%) due to its toxicity.

Table 2 Statistical data(minimum and maximum(wt.%),mean,standard deviation and abundance)of existing elements in the Mg-alloy data across data points that contain the element.The elements sorted based on their abundance.

Six groups of production and thermomechanical processing conditions in the dataset were defined for the study herein,as global categories for Mg-alloys.These conditions are listed in Table 1 and include– for simplicity of ready identification in plots– the naming convention of “Sand cast”,"HPDC”,"Cast+HT”,“extrusion”,“wrought” and “ECAP”;noting that such short-hand descriptions include broader sub-classes as per Table 1.For the database herein,the processing method of extrusion had the largest share of data,accounting for 41%of the whole population in the dataset (in part arising from a comprehensive review of the method by Zeng et al.[75]).“HPDC” and “Cast+HT” are the next two most common conditions (24 and 20% of the database,respectively).

In Table 3,the lower and upper bounds,mean,and standard deviation of three mechanical properties that are of primary interest,namely ultimate tensile strength,yield strength and ductility are provided.

The collated data covers a wide range of yield strength values from 21 MPa to 610 MPa.The lower and upper bounds of ultimate tensile strength in the dataset are 52 and 710 MPa.It is noted that the lack of high strength and simultaneously ductile Mg alloys is an issue that has dominated the field of Mg-alloy research and development for many years.However,although several valuable attempts have recently been made to improve the strength of Mg alloys,only 5% of all alloys herein have an ultimate tensile strength value greater than 500 MPa.A wide range of ductility has also been identified in the database,with values raging from 0.23 -65.2%.Recent advancements to improve the ductility of Mg alloys are limited to eleven alloys that present elongation values higher than 40%.

Fig.1.(left) Ultimate tensile strength versus ductility,and (right) ultimate tensile strength versus yield strength,for the 916 Mg alloys in the database explored herein.Univariate distribution of mechanical property variables is added to the plot margins.

Fig.2.The relationship between the most important alloying elements and ultimate tensile strength (MPa) as well as ductility (%) for all 916 Mg alloys in the database.

To investigate the overall behaviour of magnesium alloys from the viewpoint of their mechanical properties,pairwise scatter plots of ductility,ultimate and yield strength for all data points are shown in Fig.1.

Univariate distributions of mechanical property variables are provided along the plot margins.In addition,a summary of the most effective individual constituents on ultimate tensile strength and ductility of the alloys under six different processing conditions is presented in Fig.2 using a scatter plot matrix (SPLOM).

3.3.Descriptive interpretations from data visualisation

In order to describe the inferences from the data visualisation of the database,some of the effects of alloying on mechanical properties are discussed.

There are several alloys with a relatively high Gd content(8 -12.6 wt.%),that present high UTS values of∼560 -600 MPa.For instance,with 8.5 wt.% Gd through the extrusion process,the UTS and YS can achieve values of 600 and 575 MPa,respectively,in the case of the alloy Mg-8.5Gd-2.3Y-1.8Ag-0.4Zr.Generally,most Gd-containing Mgalloys have comparatively high values of UTS and YS,albeit they are confined to the lower proportion of elongation values.A unique example is,however,evident in Fig.1,where Mg-8.3Gd-4.2Y-1.4Zn-1.1Mn has a rather high value of UTS(538 MPa) and an above average value of ductility (13.1%).The median and minimum of UTS in this class are 404 and 190 MPa respectively,while the minimum and median values of ductility are 0.4 and 6.15%respectively.The Gd-containing Mg alloy with the lowest UTS happens to be a binary Mg-Gd alloy with only 0.22 wt.%Gd,despite the maximum solid solubility of Gd in Mg being reported as 23.49 wt.% [86].The combination of 2% Gd with 0.5% Mn in extruded samples led to some high ductility alloys with∼50% elongation.Although the relationship between different sets of alloying elements and thermomechanical processing with final mechanical characteristics of alloys is significantly complex,Gd is a conspicuous candidate for yielding both high strength and ductility.

In addition to Gd,it may be observed that Al and Y were also present in comparatively high percentages in several of the Mg-alloys explored,again as a result of there being some appreciable level of solid solubility of Al and Y in Mg [86].Aluminium as the second most frequent alloying element in the database (around 43% of the whole population),is one of the constituents of the alloy with a high UTS of 710 MPa(Mg-20Al-15Ca).In addition to the pair of Al and calcium(Ca),the combination of Al and gallium (Ga) in Mg-alloys was also seen to lead to relatively high UTS values of∼600 MPa.There are only five alloys in the database that contain Ga,and all of them are comparatively strong with UTS values in the range of 595 -646 MPa.However,concomitantly their ductility is comparatively low,in the range of 1.2-3.5%.All five alloys containing Ga also contain Al and Zn;thus,it is evident that future alloys developed to test the influence of Ga,are a worthy area of exploration.

Although some Al-containing high-strength Mg alloys were observed in the database,the average UTS for Mg-Al alloys is∼260 MPa,which is slightly below the mean value of UTS from the database (275 MPa).The high abundance of Mg-Al alloys in the database,however,sees approximately fifty Mg-Al alloys with a UTS>400 MPa.Considering the lower strength Mg-Al alloys (UTS<150 MPa),in such cases Al is principally alloyed in combination with Zn,Mn,lithium(Li),and Nd.Although there are very few Mg-Al alloys with ductility higher than 30%,the maximum elongation for Alcontaining Mg alloys of 46% was obtained by co-alloying with Li.

Around 40% of the alloys with Y in their chemical composition have comparatively high values of UTS,>400 MPa.Similar to the effect of Al upon Mg alloys,the mechanical properties of Mg-Y alloys was also significantly influenced by other constituent elements;the minimum and maximum UTS of Mg-Y alloys being 75 and 650 MPa respectively.The maximum solid solubility of Al and Y in Mg is around 12.5 wt.%.When both Y and Li were present,a high elongation of 64% was obtained.Overall,only a small number of Y-containing alloys (around 5%) have ductility values higher than 30%.

In exploring the effect of Zn on mechanical properties,it is noted that 66% of alloys in the database contain Zn,and the maximum Zn content explored in Mg alloys was 14.3 wt.%.The median value of Zn concentration in Mg-Zn alloys is 1 wt.%,with a third quartile value of 2 wt.%.This indicates that most of the Zn-containing Mg alloys have a relatively low concentration of Zn– associated with a low solid solubility of Zn in Mg (6.21 wt.%),and also the fact that Zn increases Mg-alloy density significantly.The number of alloys with Gd content>10 wt.% was three times more than the number of Zn-containing alloys in the same range.In binary Mg-Zn alloys the maximum yield and ultimate tensile strength values were 275 and 318 MPa respectively,while the ductility had a median value of 20.5%.Zinc was present in the composition of both high strength Mg-alloys with UTS of 650 MPa and a highly ductile Mg-alloy with elongation of 48.8%,with Zn being a vital component in the presence of other elements.An example of a high strength alloy in this class contains 2.5% Zn and 6.8%Y,while an example of a ductile sample is Mg-2.1Nd-0.2Zn-0.5Zr.Around 86 out of 615 zinc-containing alloys have ultimate tensile strengths greater than 400 MPa,while only 22 alloys in this group have elongation values higher than 30%.

Manganese was present in 394 Mg alloys in the database and used as an alloying element in the range of 0.004–2 wt.%,whilst Mn has a maximum solid solubility in Mg of∼2.22 wt.%.The presence of Mn in a range of alloys with diverse compositions means that a wide range of strength and ductility was reported for alloys containing Mn,with UTS values ranging from 92 to 564 MPa and elongation values from 0.23 to 50%.As previously mentioned,an elongation of 50% was reported for a Mg alloy with Gd and Mn,while the highest tensile strength of 564 MPa in Mn-containing Mgalloys resulted from the combination of 12.6% Gd,1.3% Y,0.9% Zn and 0.5% Mn.

Zirconium (Zr) was the next most common element in Mg alloys (present in 235 of the 916 entries).A maximum solubility of 2.69 wt.% Zr is possible in Mg,and the presence of Zr was reported in the range of 0.01-3.0 wt.%.Approximately a quarter of the alloys containing Zr had a UTS>400 MPa,with only 5 out of 235 Zr-containing alloys having a ductility greater than 30%.

The remaining alloying elements in Mg alloys were present in less than 15%of all database entries.Of these less common elements,Ca was present in twenty alloys with UTS values>400 MPa and in five alloys with elongation values>30%.The alloy Mg-1Zn-0.5Ca has a maximum ductility of 44%.Terbium,Ho,dysprosium (Dy),erbium (Er),and scandium(Sc)– all with a rather high solid solubility in Mg,similar to Gd(i.e.,in the range of∼20 wt.%)–may possibly play roles in development of high strength values.However,very few studies have focused on these elements,presumably owing to the cost and relative scarcity of such elements.Additionally,other similar elements such as thulium (Tm) and lutetium(Lu),have not been reported in the context of their effect on the mechanical properties of Mg alloys.

Amongst the rare earth elements,Nd,Ce and La have similar effects on tensile strength,albeit that there is some nuance in their individual and combined effect [76].Including these three elements in the composition of the alloy,few alloys exceed a UTS>500 MPa.Around 25% of all database entries contain Nd in the range of 0.01-8 wt.% while 20% of alloys contain Ce in the range of 0.01–3.92 wt.%.Neodymium in combination with Zn and Zr can lead to a comparatively high ductility (eg.48.8% for Mg-2.1Nd-0.2Zn-0.5Zr).In the case of Mg-alloys with Ce,a somewhat lower maximum ductility of 33% was observed in Mg-8Al-0.5Zn-1.3Ce.Around 19%of the whole population of Mg alloys contain La in their composition in the range of 0.01-6 wt.%.Eight alloys with high concentrations of La had an appreciable UTS between 400 and 500 MPa.While alloying with Nd and Ce seems to produce alloys with desirable ductilities as discussed above,alloying with La resulted in a maximum observed elongation of only 23% (compared with 49% and 33% when Nd and Ce were used as alloying elements respectively).

From Fig.2,it may be observed that Mg-alloys containing appreciable Li are generally the most ductile.The alloys with the three highest ductility values all contain Li and were processed under wrought conditions (purple points).The highest elongation of 65.2% is achieved in a binary Mg-Li alloy with 8.5 wt.% Li,while the binary Mg-Li alloy with 5.5 wt.% Li has the third highest elongation value (52.3%).In the second highest ductility (64% elongation) Li-containing Mg alloy,1 wt.% Y was added to Mg-8.5Li.In general,Mg alloys that contain Li have an average ductility of 25% elongation.That is almost three times greater than the average ductility of all Mg alloys in the database.This may be rationalised by the fact that Li additions induce a dual-phase (HCP+BCC)or single BCC phase structure.This phase change (of the alloy matrix) is unique to Li for alloys in the database.However,whilst Li is effective in producing ductile magnesium alloys,the overall strength of these alloys is rather low.The maximum UTS of alloys in this group is 280 MPa.This is offset by the lower density of these alloys when considering strength-to-weight ratios.Magnesium-Li alloys show exceptional promise as also highlighted in recent studies [87],although their production is amongst the most complex of all the alloys investigated in the database arising from the reactivity of Li.

Apart from the Mg-Li alloys,the highest elongation observed in Li-free Mg-alloys was 50%,in the Mg-2Gd-0.5Mn alloy.The present analysis highlights that the effect of each individual alloying element alone on mechanical properties is different from the synergistic effect of that constituent in the presence of other elements.

Silver (Ag) was present in only nineteen alloys,however five of these alloys display a UTS>400 MPa.The maximum UTS of 600 MPa reported for this group corresponds to an alloy previously mentioned as containing Gd.These Agcontaining alloys present maximum and median elongations of 25.5% and 4%,respectively.

In exploring the effects of ytterbium (Yb),although,there are only a small number of Mg-Yb alloys in the database,it becomes evident that all of them have elongation values around 20%,with the exception of one sample with lower ductility of 5.5%.It was noted that although Yb-containing Mg alloys have appreciable ductility values,their UTS values are moderate at around∼300 MPa.An average elongation of∼20% and average UTS of∼337 MPa were reported for Yb-containing Mg alloys.

In terms of thermomechanical processing conditions,the individual effects of alloy production and processing route upon the UTS and ductility are shown in Fig.3.

Thermomechanical processing involving extrusion yields the largest proportion of alloys with high values of UTS.The second and third most effective processing methods appear to include the “cast+HT” and wrought conditions.Generally,the highest values of both yield and ultimate tensile strengths resulted from the three processing routes of extrusion,‘Cast+HT’ and wrought.This reveals that the combination of thermal,and mechanical processing are critical in alloy strengthening.The highest elongation values in the database were obtained from wrought and extrusion processing conditions.The majority of comparatively low strength and low ductility Mg alloys were produced from HPDC(which can also be ascertained from the green bars in the histogram of Fig.1).Conversely,the medium and high values of strength with retained ductility resulted from extrusion(which can also be ascertained from the orange bars in the histogram of Fig.1).Alloys processed using ECAP notionally revealed a low tensile strength mean of 194 MPa and medium ductility values of 12–32%.However,there was one alloy processed using ECAP that resulted in a UTS of 418 MPa containing 16.4 wt.% Gd.

Based on the analysis of the database presented herein,it appears that there is still a need for further research and development in the area of Mg-alloys with high strength and high ductility.Although some general trends have been observed,the complex relationships between different alloying elements and their (thermomechanical) processing make it necessary to use advanced modelling techniques to fully understand the synergistic effects of these factors.One trend that was readily observed– and anticipated from the broader literature -is the inverse correlation between UTS and ductility.As UTS increases,ductility tends to decrease.However,there are a few samples that exhibit appreciable values of both strength and ductility,suggesting that there is still room for improvement in alloy design.To address this knowledge gap,new alloying combinations (or combinations of elements) in addition to suitable processing techniques may be guided by advanced modelling and simulation tools before being tested experimentally.

3.3.1.Pairwise correlation analysis

Fig.3.Effect of individual thermomechanical processing conditions on the ultimate tensile strength (MPa) and ductility (%) of all 916 alloys in the dataset.

To define an informative low-dimensional representation space of Mg alloys,a bivariate correlation analysis of composition was conducted via the Pearson correlation coefficient(PCC) heatmap [81].This allows the identification of a linear correlation between pairs of continuous variables.In the case of a strong correlation between two variables (defined by a |PCC|>0.9),the dimensionality of the representation space can be reduced by filtering out one of the variables in the pair.Despite the dimensionality reduction in the data,the remaining variables can still be informative enough to describe the relationships between the variables.The correlation heatmap of thirty-one elements in the dataset (inclusive of Mg) is shown in Fig.4.

Aside from the diagonal line that represents the correlation of each variable with itself,there are only four dark green squares that indicate high positive correlations.Nickel(Ni) and Dy have a high correlation coefficient of 0.8,and the pair of Be and Fe are strongly correlated with a coefficient of 0.89.These PCC values are still slightly lower than the above-mentioned criterion of 0.9.Consequently,none of these features are eligible to be filtered out.However,considering that the upper bounds of Be and Fe in the dataset are 0.0008 wt.% and 0.012 wt.% respectively (Table 2),it could be argued that there is a choice whether to keep them or remove one of these features from the pair of elements.The presence of Fe is notionally from contamination (and not deliberately added),whilst the presence of Be has been confined to ppm (parts per million) additions owing to the complexity in handling Be.Consequently,either Be or Fe (or both)may be filtered out of the database;however,it is noted that the decision to retain these elements depends on the objective of the analysis.If the purpose is to ascertain the effects of all of the elements on the properties of the alloy,regardless of their origin or the handling complexity,Be and Fe should indeed be retained in the representation space.Similarly,Ni and Dy were kept in the dataset particularly since both have been added to Mg deliberately in the respective studies.As a result,the Mg-alloy dataset reveals that it is (and remains) a dataset of high-dimensionality after bivariate correlation analysis.This analysis confirms that more advanced methods are needed to rationalise the non-linear relationships that mechanistically result in the properties of Mg alloys.

3.3.2.Exploring data patterns using unsupervised machine learning

Due to the complexity of the data in the Mg-alloy database,namely thirty alloying elements in addition to Mg and six thermomechanical conditions(37 input variables),an unsupervised ML dimension reduction algorithm was adopted to visually explore patterns of data on a map.Using the non-linear t-SNE algorithm,the 37-dimensional dataset was mapped to a 2-dimensional space with ‘t-SNE component 1′and ‘t-SNE component 2′variables [77].The resulting map reveals seven well-separated clusters after tuning the hyperparameters of the algorithm through interactive exploration.This is shown in Fig.5.

The distinguishable clusters are indicative of discovered groups of data based on both similarities and differences.To establish the boundaries between the clusters,BIRCH clustering analysis was applied to the transformed two-dimensional space.Colour coded clusters (Fig.5) are identifiable based on common characteristics of the instances.To further examine the similarities inside the individual clusters,minimum and maximum values of all thirty-seven variables within the seven clusters are summarised in Table S1.Some clusters do not contain a specific element or processing condition that are marked in red (Table S1).Some clusters are enriched by a specific element that are marked in light green.For example,cluster 1 has a high zinc content of 14.3 wt.%.

The data points that make up Fig.5 are interpretable.The clusters denoted as 4,6 and 7 are representative of notionally Al-free alloys.These are one of the important groups of Mg alloys.As summarised in Table 2,Al is the second most frequent alloying element in the Mg-alloy database with a range of 0.004–20 wt.%.Accordingly,cluster 6 with a maximum amount of 0.1 wt.% Al and cluster 4 that has only two Al-containing alloys,are considered notionally Al-free along with the cluster 7.It was noted that although these three clusters shared some properties in common,which also includes a lack of Ca,Sn and Si in the composition of the alloys;there are still differences that separate these clusters into distinct groups.Alloys within cluster 7 do not contain any Ce and La,while cluster 6 and cluster 4 contain high and medium amounts of these two RE elements respectively.Cluster 4 has the highest amount of Y (19 wt.%),whereas clusters 6 and 7 have the highest amount of Ag (5.99%) and Gd (25.6 wt.%),respectively.Cluster 7 is the only cluster without any HPDCprocessed alloys,and cluster 6 is the only cluster without Mn-containing and wrought alloys.

Fig.4.Pairwise Pearson correlation map of the 31 elements in the Mg-alloy database.The dark green and brown squares are indicative of strongly positive and negative correlations between a pair of elements in the compositional makeup of the database.

Amongst the remaining Al-containing Mg-alloy clusters,clusters 3 and 5 have the highest number of alloys with Al in their composition.These two clusters also have the highest amounts of Ca and La in the alloy compositions and are the only clusters free of Zr,Gd and having only marginal amounts of Y.Although cluster 3 has three Y-containing alloys with the highest concertation of Y at 1 wt.%,it may be considered essentially Y-free given the abundance and upper alloying range of this element in the whole database (i.e.160 alloys containing Y up to a maximum concentration of 19 wt.%).Overall clusters 3 and 5 are sparse in alloys containing RE elements,with these being limited to just 3 of the 12 REs in the database (La,Ce,and Nd).Cluster 3 contains the maximum amount of Li (14 wt.%),whilst cluster 5 does not include any Li-containing alloys.In contrast cluster 3 is Sn-free,while cluster 5 has the second highest upper bound of Sn.Cluster 5 also has several unique characteristics from its inclusion of alloys which contain Ga,Sb and Bi.Moreover,cluster 5 has the highest amount of the following three elements: Al (20 wt.%),Ca (15 wt.%) and Si (13 wt.%).

Fig.5.Seven distinct clusters distinguished on the basis of similarities in the data found using the t-SNE dimension reduction algorithm in combination with BIRCH clustering analysis.

The final two clusters,clusters 1 and 2,have rather unique characteristics.Cluster 1 contains alloys from all the processing conditions except sand casting.Cluster 2 includes the highest concentration of Mn (2 wt.%) and Cu (4 wt.%);however,none of the alloys in cluster 2 have been processed under ECAP conditions.The alloys in cluster 2 do not contain any Li andSc,while cluster 1 has the second-richest Li-containing(7.86 wt.%)and Al-containing(9 wt.%)alloys,along with the richest alloys in the following elements:Sc(15.2 wt.%),Tb(19.7 wt.%),Dy (12 wt.%),Ni (11 wt.%),Sn (9.56 wt.%)and Zn (14.3 wt.%).

Although clustering analysis is an unsupervised machine learning process and does not include property values,the corresponding ultimate tensile strength distributions of the distinct clusters were also investigated.As is evident from Fig.S1,clusters 4 and 7,which are Al-free and share the most characteristics,have the UTS ranging from 170 MPa to 650 MPa.The mean UTS values are 316 MPa for cluster 4 and 402 MPa for cluster 7.These two clusters have high Y in common.The third Al-free cluster (cluster 6) has the lowest mean UTS of 182 MPa.The clusters 4,6 and 7 are all Al-free Mg alloys,however,there are some differences between these clusters.Cluster 5 has the most unique characteristics,as described above,and has the widest range of tensile strengths (92–710 MPa) and a mean UTS value of 289 MPa.The remaining clusters (clusters 1,2 and 3) have similar UTS distributions of between 100 and 500 MPa.These algorithmic and computational discriminations between alloy composition and processing reveal that machine learning (as further explored in Part B of this study) has a significant role to play in rationalising the behaviour of Mg alloys.

3.4.Correlating thermodynamic data of Mg alloys with the database

In order to provide some insight into a physics-informed understanding of structure-property relationships in Mg alloys,the connection between mechanical properties of alloys and results from thermodynamic calculations of stable second phases (precipitates) at equilibrium was studied.The correlation between the presence of different precipitates and mechanical properties is shown in Fig.6.

Fig.6.The correlation of UTS (top),ductility (middle) and YS (bottom)with the weight fraction of thermodynamically stable secondary phases.

The results in Fig.6 include over 68 unique phases that can occur (according to thermodynamic calculations using PANDATTM) for the Mg alloys in the alloy database herein.The distinct groups of stable phases in equilibrium within alloys are listed in Table S2.As can be anticipated,the correlations revealed are very similar for the case of UTS and YS.As a result,one can focus on the analysis of the UTS and ductility.It is observed that,generally,high strength alloys (with a UTS>500 MPa) have the following precipitate phases estimated as being present from equilibrium calculations: Al2Ca,Al12Mg17,AlMgZn,REMg5,MgZn,Mg2M (Si,Sn),RE5Mg41,REMg12,RE3Al11,REMg12,CaMg2,REAl3,RE5Mg24,RE8Mg70Zn6,Mg28Y7Zn65,and Al4Ca.These precipitates also appear to be correlated with low ductility (ductility<10%) alloys.This observation rationalises,to an extent,that the presence of specific secondary phases may correlate to the strength and ductility of Mg-alloys.In contrast,in highly ductile (ductility>40%) alloys with low strength(UTS<200 MPa),RE5Mg24,AlLi,Al2Sc,CaMg2,MgZn2,Ca2Mg5Zn5,Mn (BCC),RE5Mg41,Mg28Y7Zn65are the main phases calculated to be present.

Whilst these observations outlined above do provide confidence in mechanistically driven alloy performance,an overall impression gained from Fig.6 is that an extremely complex correlation between the thermodynamically stable phases and the target mechanical properties exists that is beyond human ability to distil.It is too complex to obtain a general description of how secondary phases affect the mechanical behaviour of the alloys,with any precision,because it may be seen that the same precipitate types extend vertically and horizontally in linear manners,and simultaneously in a scattered manner.It is also evident that multiple phases co-exist in Mg alloys,adding further complexity to any analysis.Therefore,the results in both Fig.5 and Fig.6 are strongly suggestive that a supervised machine learning approach is required for determination of structure-property relationships.

3.5.General discussion

An open-source Mg-alloy database has been created using data science tools.This database provides a comprehensive description of Mg alloys for use in (subsequent) supervised machine learning,virtual alloy design,and the discovery of new Mg-alloy candidates via an objective design-directed approach.Analysis of the dataset reveals that the effect of individual constituents and thermomechanical cycles on the mechanical properties of the alloys varies in the presence of different sets of elements in the composition.Pairwise or bivariate analysis of the data confirms that there is no strong linear correlation between alloying elements in the database,indicating the need for multivariate analysis and advanced data-driven modelling (such as supervised machine learning).

However,some general trends were observed,with Gd,Ga,Y,Al,Zr,and Ca having the most significant effect on increasing tensile strength,while Li and Ce had the most significant contributions to ductility in Mg alloys.The elements Gd,Y,and Ca were observed to be important for Mg alloys with a balance of strength and ductility.The extruded and wrought alloys showed the highest values of both UTS and ductility,amongst the alloys in the database.About one-third of alloying elements in the database have not been widely used in Mg alloys,suggesting they are also potential candidates for future Mg-alloy designs (virtually or otherwise)for targeted mechanical properties.Such elements with rather high solid solubility in Mg include Tb,Dy,Er,Ho,Sc,and Ga.A paucity of simultaneously high values of strength and ductility in Mg-alloys was evident through data visualization tools.

Unsupervised machine learning was used to cluster Mg alloys based on their chemical compositions.The resulting seven similarity-based clusters highlighted the importance of Al and RE metals in Mg-alloys to date.The Li-free cluster was identified as the only cluster containing Ga,Sb,and Bi.The well-separated clusters can be used to identify the key elements responsible for achieving specific properties.The two clusters with the highest UTS values contained the highest concentrations of Y and Ga.Lastly,the thermodynamic data correlation with the database suggested that calculations of stable secondary phases may be used as a future data augmentation method to connect composition-process-property matrices to physical understanding of alloy microstructure.

4.Conclusion

In this study,a data science approach that may inform Mg-alloy design has been proposed,with the aim of alleviating costly and time-consuming empirical works.Relying on data science tools,the study herein has revealed the ability to harness the data that exists in the field and provides a path to a detailed understanding not previously possible.An open-source Mg alloy database was introduced with the key features of chemical composition,thermodynamic processing route,and mechanical properties of the alloys.This database incorporates 916 unique datapoints,including 30 alloying elements and 6 alloy production and processing categories.Based on this comprehensive database,data analysis for the first time has delivered the characteristics of Mg alloy data.

The effects of individual alloying elements and thermomechanical processing on mechanical properties of alloys were investigated.For example,Gd,Ga,Y,Al,Zr,and Ca have the maximum effect on increasing the tensile strength of the alloys,while Li and Ce make the greatest contribution in alloy ductility.Pairwise correlation analysis has revealed that there is no strong linear correlation between alloying elements in the dataset.Significantly complex and non-linear relationships in data need to be further studied through multivariate analysis and advanced data-driven modelling.

The study identified contrasting mechanical behaviours of alloys from the synergistic effects of alloying elements.Observations indicate that Gd,Y,and Ca have the highest tendency to provide both high strength and ductility for Mgalloys (from the data explored herein).Alloys with high ultimate tensile strength (UTS) and ductility resulted from extrusion and wrought processing.Unsupervised machine learning algorithms have offered further insights into the data,with the t-SNE and BIRCH algorithms identifying seven wellseparated clusters with similar patterns as low-dimensional representation spaces of the data (e.g.,Al-free,RE-free).In addition,thermodynamic calculations of the stable secondary phases at equilibrium have revealed correlations between microstructure and mechanical properties of alloys as an extremely complicated relationship that is beyond human interpretation.These findings suggest that this consolidated database,as a comprehensive description of Mg alloys,can be used for effective supervised machine learning to assist in virtual alloy design and accelerated digital discovery of new candidates.The proposed data study,including data insights and low-dimensional representation spaces,offers opportunities to define feature augmentation and selection steps that can improve the process of learning from data.

Declaration of competing interest

All authors declare that there are no competing interests.

The authors have no financial and personal relationships with people or organisations that could inappropriately influence (bias) this work.

One of the authors (Nick Birbilis) is an editorial board member for Journal of Magnesium and Alloys and is not involved in the editorial review or the decision to publish this article.

Again,all of the authors declare no conflict of interest.

Acknowledgements

The alloy database was developed at the Australian National University with the assistance of the following individuals: Prateek Arora,Jia Ye,Samyak Jain,Bhumipat (Guy)Chatlekhavanich and Dr.Zhuoran Zeng– who are gratefully acknowledged.We also acknowledge the support of the Monash-IITB Academy Scholarship.This work was funded in part by the Australian Research Council (DP190103592).We would like to thank Jane Moodie for her invaluable guidance and support throughout manuscript preparation.

Supplementary materials

Supplementary material associated with this article can be found,in the online version,at doi:10.1016/j.jma.2023.09.035.