Estimating Weibull Parameters Using Least Squares and Multilayer Perceptron vs.Bayes Estimation

2022-08-24WalidAydiandFuadAlduais

Computers Materials&Continua 2022年5期

Walid Aydiand Fuad S.Alduais

1Department of Computer Science，College of Humanities and Science in Al Aflaj，Prince Sattam Bin Abdulaziz University，Al-Aflaj，Saudi Arabia

2Department of Mathematics，College of Humanities and Science in Al Aflaj，Prince Sattam Bin Abdulaziz University，Al-Aflaj，Saudi Arabia

3Laboratory of Electronics&Information Technologies，Sfax University，Sfax，Tunisia

4Department of Administration，Administrative Science College，Thamar University，Yemen

Abstract: The Weibull distribution is regarded as among the finest in the family of failure distributions.One of the most commonly used parameters of the Weibull distribution(WD)is the ordinary least squares(OLS)technique，which is useful in reliability and lifetime modeling.In this study，we propose an approach based on the ordinary least squares and the multilayer perceptron(MLP) neural network called the OLSMLP that is based on the resilience of the OLS method.The MLP solves the problem of heteroscedasticity that distorts the estimation of the parameters of the WD due to the presence of outliers， and eases the difficulty of determining weights in case of the weighted least square(WLS).Another method is proposed by incorporating a weight into the general entropy (GE) loss function to estimate the parameters of the WD to obtain a modified loss function (WGE).Furthermore，a Monte Carlo simulation is performed to examine the performance of the proposed OLSMLP method in comparison with approximate Bayesian estimation(BLWGE)by using a weighted GE loss function.The results of the simulation showed that the two proposed methods produced good estimates even for small sample sizes.In addition， the techniques proposed here are typically the preferred options when estimating parameters compared with other available methods，in terms of the mean squared error and requirements related to time.

Keywords: Weibull distribution;maximum likelihood;ordinary least squares;MLP neural network;weighted general entropy loss function

1 Introduction

The parameters of the Weibull distribution are widely used in reliability studies and many engineering applications， such as the lifetime analysis of material strength [1]， estimation of rainfall[2]，hydrology[3]，predictions of material and structural failure[4]，renewable and alternative energies[5–8]，power electronic systems[9]，and many other fields[10–12].

The form of the probability density function(PDF)of two parameters of WD is given by:

The cumulative distribution function (CDF) and the survival functionSof the WD can be expressed as

where the parametersϑandλrepresent the scale and the shape of the distribution，respectively.

Several approaches to estimating the parameters of the WD have been proposed [13].They can generally be classified as manual or numerical[14].

Manual approaches include the ordinary least squares [15，16]， unbiased good linear estimators[17]，and weighted least squares[18].Computational methods include maximum likelihood estimation [19]， the moments estimation method [20]， Bayesian approach [21]， and least-squares estimation with particle swarm optimization[22].

In addition to computational methods， many studies in the literature have attempted to use the neural network (NN) to anticipate the parameters of the WD in many areas， such as the method developed by Jesus that applies the Weibull and ANN analysis to anticipate the shelf life and acidity of vacuum-packed fresh cheese [23].In survival analysis， Achraf constructed a deep neural network model called DeepWeiSurv.It was assumed that the distribution of survival times follows a finite mixture of a two-parameter WD [24].In another work in the field of electric power generation， an artificial NN(ANN)and q-Weibull were applied to the survival function of brushes in hydroelectric generators[25].

Recently，a few methods have been attempted to combine the robustness of the ANN and some of the above statistical methods.Maria modeled the distribution of tree diameters using the OLS and the ANN [26].In the same way and based on the ability of the OLS， in its simplest form， which assumes a linear relationship between the predictor and the unreliability function on one hand and the robustness and rapidness of the single-hidden-layer networks to handle the linear functions compared with multiple-hidden-layer [27] on the other hand， we will propose to combine OLS and a neural network to predict the two-parameter WD.

In the proposed method， we solve the problem whereby the reliability of the OLS method is compromised by outliers through the introduction of a pre-trained neural network after the linearization of the CDF.The remaining sections of this paper are organized as follows:Section 2 provides a review of different numerical and graphical methods for estimating the parameters of the WD， such as the MLE， OLS， WLS， and BLGE.In Section 3 we present the proposed methods.To evaluate their appropriateness in comparison with competing methods，the relevant performance metrics are covered in Section 4.The results are discussed in Section 5.Finally，the conclusions of this study are provided in Section 6.

2 Review of Numerical and Graphical Methods for Estimating Parameters of WD

The most commonly used approaches to estimate the parametersλandϑof the WD are described below.

2.1 Maximum Likelihood Estimator(MLE)

Let the set(x1，x2，x3，...xn)ofnrandom lifetimes from the WD be defined by Eq.(1).Then，the likelihood functionLfand its corresponding logarithmℓfor the given sample observations are shown in Eqs.(4)and(5)，respectively[28]:

The partial derivatives of the equation forℓwith respect to the variablesϑandλare given by:

The parameterλcan be obtained by using any numerical method，such as the Newton–Raphson.

2.2 Ordinary Least Squares Method(OLS)

To estimate the parameters of the WD，the OLS method is extensively used in mathematics and engineering problems [16].We can obtain a linear relationship between parameters by taking the logarithm of Eq.(2)as follows:

LetYi=ln[-ln(1-F(xi;ϑ，λ))]，Xi=lnx(i)，α0= -λlnϑ， andβ=λ.Then，Eq.(9)can be written as Yi=α0+βXi+∈i

LetX(1)，X(2)，X(3)，...X(n)be order statistics ofX1，X2，X3，...Xn，and letx(1) ＜ x(2) ＜x(3) ＜...＜x(n)be the ordered observations in a random sample of sizen.To estimate the values of the cumulative distribution functionF(x(i)ϑ，λ)，we use the mean rank method as follows:

2.3 Weighted Least Squares Method(WLS)

In the WLS estimate，the parametersλandϑare the values of the parameters that minimize the function:

The biggest challenge in the application of the WLS is in finding the weights Wiin Eq.(15).We use the delta method[29]to find them:

Hence，the weights can be written as follows:

MinimizingQ*W(λ，ϑ)we obtain the WLS estimates ofλandϑas

2.4 Approximate Bayes Estimator

In this section，the approximate Bayesian estimator under a GE loss function of the parametersλandϑof the WD is discussed.We assume a non-informative(vague)prior according to[30]as

The parametersλandφare estimated using Lindley’s approximation technique.The posterior expectationEis given by Eq.(22)[31]:

Moreover，it can be asymptotically estimated by:

wherei，j，k，l= 1，2，...m，φ=(φ1，φ1，...φm)，π(φ) represents the prior distribution ofφ，u=u(φ)，L=L(φ) is the likelihood function，ρ≡ρ(φ) = ln(π(φ))，andσij=element (i，j) of the covariance matrix of the parameter estimators.

For the two-parameter caseφ=(λ，φ)，Eq.(22)reduces to:

The functions in Eq.(24)are computed using MLEs with respect toλandφ.

To apply the Lindley model of Eq.(24)to estimate the parameters of the WD，the following are obtained from Eq.(23):

The elementsσijof the covariance matrix are expressed by

2.4.1 Estimates Based on General Entropy Loss Function

The general entropy loss functionLforφ，shown in Eq.(24)，is expressed by the following form[32]:

In the same way，the BLGE offorϑis found by the following expressions:

3 Proposed Methods

In the following sections，we describe the proposed BLWGE and OLSMLP methods.

3.1 Weighted General Entropy Loss Function

The WGE loss function was proposed as dependent on the weighted loss GE function as follows:

whereφrepresents the estimated parameters that minimize the expectation of the loss function(Eq.(27))，andw(φ)represents the proposed weighted function as expressed by Eq.(28):

Based on the posterior distribution of the parameterφ，and by using the WGE function given in Eq.(28)，we obtain the estimated BLWGE of the parameterϑas follows:

Thus，we can find that

Consequently，the BLWGE of parameterφ，obtained by using the WGE loss function，isas presented in Eq.(29):

provided thatEφ(φ-z)andEφ(φ-(z+q))exist and are finite，whereEφrepresents the expected value.

We note that the GE is a special case of the WGE whenz=0 in Eq.(29).

3.1.1 Estimates of Parameters of WD Based on Weighted General Entropy Loss Function

Based on the WGE and by using Eq.(29)，the approximate Bayes estimatorforλis shown as:

Thus，the weighted Bayes estimator for the shape parameterϑis

3.2 Ordinary Least Squares and the Multilayer Perceptron Neural Network(OLSMLP)

As previous studies have shown[14，33]，manual calculations yield the smallest standard deviation(STD)in the parameterλ，and are consequently more accurate than computational methods.Moreover， methods of manual estimation are more accurate for small sample sizes [14].However， these computational methods， especially the OLS， are sensitive to outliers and specific residual behavior[34].To solve these problems， many studies have proposed different methods， such as the iterative weighting method based on the modified OLS[34]，the WLS，and many other methods based on the WLS[35].A major challenge in these methods is determining the weights.

3.2.1 Proposed Method to Estimate Parameters of WD

We now describe the proposed method，which is divided into two main parts:the linearization of the CDF，and the application of a feedforward network with backpropagation to estimate the values ofλandϑof the WD.

The OLS method takes the CDF defined in Eq.(2) and linearizes it as described in Eq.(10).It then determines the coefficientsα0andβvia linear regression by using the slope and the intercept.The principle of the method used by the OLS to computeα0andβcan be violated even with a few outliers.

Therefore， instead of using the slope and the intercept， we propose applying Algorithm 1 as described below.

·Application of Proposed Model to Estimate Parameters of WD

The steps used to evaluate the parameters of the WD from the input csv file are described by Algorithm 1.

Input:Three comma separated value file (CSV) files containing the matrices Xi and Yi， and their corresponding parameters(shape and scale)SCi.Output:The predicted shape ˆλOLSMLP and scale ˆϑOLSMLP for the test set.1:Normalize the inputs matrices Xi，Yi，and SCi separately to unit norm using RobustScaler followed by MinMaxScaler norm.2:Split t he normalized Xi e the neural netw ile the model and，Yi，and SCi into random training and test subsets.3:Creatork model(define the input layer，hidden layer，and output layer).4:Comp fit it to the data.5:Predict ˆλOLSMLP and ˆϑOLSMLP for the test set.6:Evaluate the performance of the proposed model.Steps 2，3，4，and 5 are explained in more detail in the following subsections:

·Data Normalization

Normalization is an essential preprocessing tool for a neural network [36，37].Before training a neural network model，the input data are scaled using the RobustScaler norm in a preliminary phase，where each sample with at least one non-zero component is rescaled using the median and quartile range as described by Eq.(38).The RobustScaler norm is used to remove the influence of outliers.Following this，the MinMaxScaler，defined by Eq.(39)，is applied to the output of the RobustScaler.The MinMaxScaler scales all the data features to the range[0，1]:

whereXis a feature vector，Xiis an element of featureX，is the rescaled element obtained by using MinMaxScaler，andis the rescaled element obtained by using RobustScaler.

·Structure of the Proposed Neural Network

To estimate the parameters of the WD，we propose using a multilayer perceptron(MLP)，which is a feedforward network with backpropagation[38].According to the structure of the MLP，the proposed network，as shown in Fig.1，consists of an input layer(withnneurons)，a hidden layer(withkneurons)，and an output layer(withmneurons that yield the Weibull parameters as the output of the network).

Figure 1:Topology of the proposed MLP

Various criteria have been proposed in the literature to fix the number of hidden neurons[39].In our architecture，we use the rule whereby“the number of hidden neuronskshould be 2/3 times the size of the input layer，plus the size of the output layer”[38–40].

The hyperbolic tangent activation function (tanh) is proposed here in the input layer， and the sigmoid function in the output layer.They are used frequently in feedforward nets， and are suitable for shallow networks as well as applications of prediction and mapping[38，41].

The objective of our neural network is a model that performs well on the data used in both the training and the test datasets.For this reason，we add a well-known regularization layer as described in the next section.

·Regularization

Regularization is a technique that can prevent overfitting [37，38].A number of regularization techniques have been develop in the literature， such as L1 and L2 regularizations， bagging， and dropout.In the proposed structure， we use dropout， a well-known technique that randomly “drops out” or omits hidden neurons of the neural network to make them unavailable during part of the training[38，42].This reduces the co-adaption between neurons，which results in less overfitting[38].

·Optimization Algorithm

The optimization of deep networks is an active area of research[43].The most popular gradientbased optimization algorithms are Adagrad， Momentum， RMSProp， Adam， AdaDelta， AdaMax，Nadam， and AMSGrad [38，43，44].We chose Nadam due to its superiority in supervised machine learning over the other techniques， especially for a deep network [43].Moreover， it combines the strengths of the Nesterov acceleration gradient(NAG)and the adaptive estimation(Adam)algorithms as described in[44]:

t:time step

αnad:learning rate

vt:the exponential average square of gradients

mt:momentum vector

wt:the weight that we want to update

ε:smoothing term

:gradient ofL;the loss function to minimize.

β1，β2:momentum decay and scaling decay，respectively

4 Performance Metrics

To evaluate the proposed methods with respect to other methods， we used two statistical tools，the mean squared error(MSE)and the mean absolute percentage error(MAPE)[5]，in addition to the computation time.

5 Results and Discussion

5.1 Dataset Description

We generated 250，000 random data points from the WD for different parameters and different values ofϑranging from 1 to 299，and those ofλranging from 0.5 to 100.For each shape/scale pair，we generated 10，000 samples of different sizesn=10， 20， 30， 40， and 50.

We used the same dataset for the neural network in the training phase，but applied one sample to each shape/scale pair.This was unlike in the other methods(MLE，OLS，WLS，BLGE，and BLWGE)，which used 10，000 samples to estimate the parameters of the WD.This dataset was divided into two subsets.The first subset was used to fit the model， and is referred to as the training dataset; it was characterized by known inputs and outputs.The second subset is referred to as the test dataset，and was used to evaluate the fitted machine learning model and make predictions on the new subset，for which we did not have the expected output.We chose the train–test procedure for our experiments because we guessed that we had a sufficiently large dataset available.

5.2 Experimental Setting

5.2.1 Parameter Selection for OLSMLP

In all experiments，we trained the model with Google Collaboratory(GPU)for 25 epochs.We used the Nadam optimizer with learning rate ofαnad= 0.001; terms representing the momentum decay，scaling decay，and smoothing were kept at their default values:β1=0.9，β2=0.999，andε=10-7.A dropout with a ratio of 0.6 was applied to the hidden layer.As described in Section 3，the hidden and output layers used thetanhandsigmoidactivation functions，respectively.The error function or loss function was the mean squared error，and was used to estimate the loss of the model.

5.2.2 Parameter Selection of BLGE and BLWGE

In all experiments，the parameters of the BLWGE and BLGE were empirically determined.The values of the weightsqandzof the BLWGE were-3 and 6，respectively.For the BLGE，the parameterq=1.5.

5.3 Estimating Parameters of Weibull Distribution

5.3.1 Effect of Sample Size on Estimation of WD Parameters Using Prevalent Methods

Fig.2 shows the evolution of the average MSE as a function of the sample sizen.The MSE decreased quasi-linearly fromn=10 ton=40 for all methods.Fig.2 shows that the BLWGE，WLS，BLGE，and MLE had the lower MSE values for the different sample sizes compared with the OLS.We can deduce also that the WLS，GE，and MLE gave similar results with a slightly better start for the MLE atn=10.

5.3.2 Effect of Sample Size on Estimation of WD Parameters Using Proposed Method

To illustrate how the sample size affects the calculation of the MSE，Fig.3 shows the evolution of the latter as a function of the sample sizenfrom 10 to 50.

From Fig.3， we can deduce that as the sample size increased， the estimate of the MSE by the proposed method decreased and fluctuated.This fluctuation was due to the random nature of the information used and the limited number of samples(one sample)for each pair of shapes/scales.

Tabs.1 and 2 show the results of the simulation of the proposed method and the other methods considered above.The results show the following:

Figure 2:The evolution of the MSE using the parameters ϑ = 2.5 and λ =1.685 as a function of n =[10-40] for the MLE，OLS，WLS，BLGE，and BLGWGE

Figure 3:The evolution of the MSE using the parameters ϑ = 0.75 and λ =1.75 as a function of n = [10-50]

1.The MLE and WLS behaved similarly as shown in Tab.1:Their MSE values decreased gradually when their shape values increased at a fixed scale.Conversely，when the scale value increased with a fixed shape，the MSE increased.

2.The behavior of the OLS and GE was the opposite of that of the MLE and WLS.As depicted in Tab.1，the MSE increased when the shape increased(at a fixed scale)，and decreases when the scale increased(with a fixed shape).

3.The BLWGE and the OLSMLP behaved similarly in terms of scale estimation，as shown in Tab.1.

4.All methods had the same global variation function，as shown in Fig.4 and Tab.2.

5.The MLE was slightly superior globally in terms of scale estimation to the other methods，but had the worst estimation of shape，as shown in Tab.2.

6.The proposed MLP neural network acceptably estimated the scale， better than some methods.By contrast，it outperformed all other methods in terms of shape estimations most of the time.

Table 1:MSEs of the estimated ϑ with varying values of the parameters λ and ϑ for different parameters of WD estimation methods

Table 2:MSEs of the estimated λ with varying values of the parameters λ and ϑ using different methods to estimate the parameters of the WD

Figure 4:MSEs of with varying values of the parameters ϑ = [1 1 1 2.5 3.25 4] and λ =[1.5 1.75 2 4 4 4]for the MLE，OLS，BLGE，WLS，and the proposed methods

From Tab.3， we see that both statistical indicators， MSE and MAPE， yielded different values.The global rank was calculated to evaluate the best method.The results in the table indicate that the proposed method offered the best compromise between shape and scale estimation， as indicated by the global rank.Moreover，it retained the speed of the OLS and enhanced the accuracy of estimation of the parameters of the WD compared with the MLE，BLGE，and BLWGE.

Table 3:Performance evaluation of the MLE，OLS，WLS，BLGE，BLWGE，and OLSMLP methods using different statistical indicators

6 Conclusion

This study proposed a method to estimate the parameters of the WD.This method is based on the OLS graphical method and the MLP neural network.The MLP solves the problems caused by the presence of outliers and eases the difficulty of determining the weights in the WLS method.It yielded acceptable results in simulations，especially in terms of shape estimation.It is also faster than the MLE，BLGE，and BLWGE.

We also proposed a second method (BLWGE)， in which we introduced weight to the GE loss function.The results of simulations showed that BLWGE yields good results， especially in terms of shape estimation，compared with the other methods.

Acknowledgement:This project was supported by the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University under Research Project No.2020/01/16725.

Funding Statement:The authors are grateful to the Deanship of Scientific Research at Prince Sattam bin Abdulaziz University Supporting Project Number(2020/01/16725)，Prince Sattam bin Abdulaziz University，Saudi Arabia.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding this study.

Computers Materials&Continua

2022年5期