Turnout fault prediction method based on gated recurrent units model

2021-09-15ZHANGGuoruiSIYongboCHENGuangwuWEIZongshou

Journal of Measurement Science and Instrumentation 2021年3期

ZHANG Guorui,SI Yongbo,CHEN Guangwu,WEI Zongshou

(1. Automatic Control Research Institute,Lanzhou Jiaotong University,Lanzhou 730070,China； 2. Gansu Provincial Key Laboratory of Traffic Information Engineering and Control,Lanzhou 730070,China)

Abstract：Turnout is one of the important signal infrastructure equipment,which will directly affect the safety and efficiency of driving.Base on analysis of the power curve of the turnout,we extract and select the time domain and Haar wavelet transform characteristics of the curve firstly.Then the correlation between the degradation state and the fault state is established by using the clustering algorithm and the Pearson correlation coefficient.Finally,the convolutional neural network (CNN)and the gated recurrent unit (GRU)are used to establish the state prediction model of the turnout to realize the failure prediction.The CNN can directly extract features from the original data of the turnout and reduce the dimension,which simplifies the prediction process.Due to its unique gate structure and time series processing features,GRU has certain advantages over the traditional forecasting methods in terms of prediction accuracy and time.The experimental results show that the accuracy of prediction can reach 94.2% when the feature matrix adopts 40-dimensional input and iterates 50 times.

Key words：turnout;clustering;convolutinal neural network (CNN);gated recurrent unit (GRU);fault prediction

0 Introduction

In recent years,with the continuous development of deep learning and artificial intelligence technology,intelligent operation and maintenance technology has gradually attracted more attention.How to apply new technology to railway signal system has become the goal of research in the next stage.Turnout is one of the basic equipment in the railway signal system.The safety of train operation is guaranteed mainly by the train path conversion through the point machine,which is also the equipment prone to failure.According to the statistics of railway users,the switch fault accounts for about 40% of the total signal failures.At present,scholars mainly focus on fault diagnosis.However,there are still some deficiencies in fault prediction,which needs to further study.The manual fault feature extraction was carried out on the action current curve of S700K switch machine to establish fault feature matrix,which was used as the input of back propagation (BP)neural network for fault identification[1].The fault text database was established through feature extraction of the fault text information recorded in the signal centralized monitoring system[2].The state of switch equipment was subdivided by adding the intermediate fault state of switch equipment.Fault characteristics were extracted from time domain and value domain of power,and then an implicit Markov model was established for fault diagnosis[3].The multiple models were established to realize the status prediction of turnout by analyzing the historical data and identifying the non-fault action curve of turnout[4].In addition,Bayes network,grey correlation and expert system have been also applied to the fault diagnosis of turnout[5-7].

Turnout fault prediction usually refers to the use of artificial intelligence algorithms to build predictive models to diagnose turnout faults in advance.Turnout fault prediction is conducive to realizing the intelligent maintenance of railway signal equipment and the transition from fault repair to state repair,improving the work efficiency of maintenance personnel,and further ensuring the safe operation of railway.

In our work,the time-domain statistical feature and wavelet transform feature extraction of the turnout action power curve are firstly presented to establish the feature vector.Then,the degradation data of the turnout are classified by clustering algorithm and the correlation between the degradation state,and the fault state is established by Pearson correlation coefficient.Finally,the state prediction of the turnout is completed by convolutional neural network-gated recurrent unit (CNN-GRU)prediction model.

1 Analysis of turnout state

1.1 Analysis of switch power curve

In this study,the S700K point machine is taken as the research object.The power curve of its normal operation in the process of conversion is shown in Fig.1.

Fig.1 Normal operation power curve of S700K

The action process of S700K switch machine mainly includes the following four stages[8].

Start unlock stage:1DQJ is sucked up and switch action curve is recorded.When 2DQJ turns pole,a peak appears in the action current curve,which indicates that the switch starting circuit is connected and the switch starts to act and unlock.

Conversion stage:this stage is the action process of the switch.Under normal conditions,its current curve should be smooth and the action power should be roughly the same as the reference curve.

Locking stage:the automatic switch node is converted after the switch is completed.At the same time,the starting circuit is disconnected and the power is reduced.

Presentation stage:1DQJ enters into slow release state.There is still two-phase current in start-up circuit in this stage.Finally,1DQJ falls down and stops recording switch action curve.

1.2 Common faults and causes of switch

The common faults of S700K switch are shown in Table 1,and the power curves of common faults are shown in Fig.2.

Table 1 Common fault phenomena and causes of S700K switch

(a)F1

According to the statistics of railway users,the most frequent faults in turnouts are clamping gap and foreign objects in the turnouts,which account for about 40%.The two faults of high turnout switching resistance and lack of oil in the lock hook account for about 20%,while other faults occur with low frequency.

2 Algorithm of state prediction

2.1 CNN

CNN has a neural network structure more suitable for image and speech recognition,which can perform feature extraction excellently and is simple to train[9].

A CNN consists of several convolution layers,pooling layers and dense layers.Convolution operation in convolution layer is shown in Fig.3.

Fig.3 Convolution operation

The feature matrix is used as input.The convolution kernel scans the input matrix from left to right,multiplies the elements corresponding to the input and accumulates the elements in the output feature matrix.Finally,the feature matrixFh,wis obtained after scanning.In Fig.3,handwofWh,ware the height and width of the convolution kernel weight matrix,which can be set in advance when building the model.ThehandwofFh,ware the height and width of the output feature matrix,which can be calculated as

wF=(wx-ww+2wp)/ws+1,

(1)

hF=(hx-hw+2hp)/hs+1,

(2)

wherewFis the width of the feature matrixFafter convolution andwxis the width of the feature candidate matrix;wwis the width of the filter andwpis the number of columns filled with zeros around the original matrix;wsis the step when the window slides horizontally andhFis the height of the feature matrixFafter convolution;andhxis the height of the feature candidate matrix.The calculation principles of Eqs.(1)and (2)are the same.

After convolution,the purpose of the pooling layer is to continue to reduce the dimension of the feature matrix and the amount of data calculations.MaxPooling method and pooling window are selected for the pooling operation.It scans the feature matrixFh,wfrom left to right,and sets the step size to control the number of sliding grids.The maximum value in the pooling window is selected as the output at this position,and finally the feature matrixFh,wwith a smaller dimension is obtained.

2.2 GRU

GRU is a variant of long short-term memory network (LSTM).Compared with LSTM with three gates,GRU has only two gates and has no cell state,which simplifies the structure of LSTM.With fewer parameters,GRU is relatively easy to train and less prone to overfitting problems[10].The network structure of GRU is shown in Fig.4.

Fig.4 GRU network structure

In the structure of GRU network,there are two gates that act on different objects,namely the update gate and the reset gate.The update gate determines the hidden state and memory content of the last moment,while the reset gate determines the current memory content.

The time series power value of the turnout power curve is used as the input of the CNN and the features are automatically extracted through it.Then the output feature matrix is used as the input of the GRU network.After the dense layer,the activation function uses Softmax to output the classification probability of switch state,so as to realize the prediction of switch state.

3 Fault prediction method based on GRU

The process of fault prediction is shown in Fig.5.

Fig.5 Fault prediction process

Step 1：The time domain and Haar wavelet features are extracted from the non-fault degradation data and normal and fault data of switch,respectively.Then the process of feature selection is accomplished by using a model-based sorting method.

Step 2：The characteristics of degradation data are clustered by clustering algorithm.Then the label is determined and the data is merged.

Step 3：The improved local linear embedding (MLLE)dimension reduction algorithm is used to visually analyze the merged features.Then the Pearson correlation coefficient is calculated to construct the state correlation between degradation and fault.

Step 4：Combined with the conclusion of Step 3,the state of turnout is predicted by using CNN-GRU model.

3.1 Feature extraction and selection of switch power curve

The centralized monitoring system collects a sample point every 40 ms.The normal action time of switch is about 6.5 s and the action time of some faults is about 7 s.The 7 s power curve sample point is intercepted as the sampling point.If the time is less than 7 s,zero is added after the curve data point.If the time is more than 7 s,the data of the first 7 s of the curve are taken.Finally,175 data points are collected for each power curve sample.

In order to avoid the influence of high-dimensional data on the clustering effect,feature extraction and selection are carried out before clustering to improve the accuracy of clustering.According to the four different stages of switch action power curve,the corresponding time domain features and Haar wavelet transform features are extracted to obtain the feature candidate set that can represent the curve.fi(i=0,1,…,46)is theith statistical feature,as shown in Table 2.

Table 2 Features of S700K switch in time domain

For each switch power curve sample,47 statistical features are selected from the table.f0is the time point corresponding to the maximum value of the curve andfi(i=1,…,44)is the time domain characteristic corresponding to the four stages of switch action.f45andf46are the sum of mean and difference of curve after Haar wavelet transform.The candidate features of each sample are

f=[f0,f1,…,fi,…,f46].

(3)

The data set includes 15 sets of normal turnout operation curves,15 sets of each fault curve,a total of 120 sets of samples,and 487 sets of degradation power samples.The respective features are extracted and combined into a feature sample matrixF.

(4)

In order to further reduce the redundancy among the features,model-based feature sequencing is used to select the extracted 47 dimensional features to reduce the redundancy and screen out more representative features for the next clustering.

The idea of feature ranking based on the model is the same as that of the wrapping method.Firstly,the relationship between features and tags is judged.Linear algorithm is used for linear relationship and nonlinear algorithm is used for nonlinear relationship.Then,each feature is modeled separately and cross validated.Finally,features with high score are selected to form feature subset[11].Feature score is shown in Fig.6.

Fig.6 Feature ranking score based onm odel

According to the score in Fig.6,35 dimensions are selected for the next clustering analysis of features that exceed the average value of the highest score and the lowest score.

3.2 Clustering analysis of switch degradation data

In order to establish the connection between the degradation state and the fault state of the turnout,it is necessary to perform cluster analysis on the degradation data to determine the classification of the degradation data.

Minibatch KMeans algorithm is a variant of KMeans algorithm.It is a clustering model that can keep the clustering accuracy as far as possible and can greatly reduce the calculation time[12].It can be imported from the third-party scikit-learn module,which is easy to call and set parameters.This algorithm is used to cluster the 35-dimensional features of 487 degradation data and the clustering results are shown in Table 3.

Table 3 MiniBatch K Means clustering results

The Calinski-Harabaz and contour coefficients in Table 3 are the scoring standards for the clustering results.The higher the score,the closer the clustering results are to the actual classification.When the number of categories is 5,the two scoring standards have the highest score.Therefore,the false labels of switch degradation data are determined as 5 categories,which are used as the subsequent screening features.The action power curves of five non-fault types in the degradation data are randomly selected,as shown in Fig.7.

(a)Non-fault 1

3.3 Association of switch states

In order to realize the prediction of the turnout status,it is necessary to establish the connection between the non-fault and the failure of the turnout,which is an important link in determining the failure of the turnout.After the process of feature extraction and selection in Section 3.1,the feature of original data has been reduced to a lower dimension.However,in order to intuitively display the clustering effect in Section 3.2,further reduction of feature dimensionality is still needed to realize visualization.

Finally,the Pearson correlation coefficient is used to calculate the correlation coefficients between the five degradation states and the failure and normal states,which can be used to analyze the future development trend of the degradation states and prepare for the next step of failure prediction.

3.3.1 Feature selection and visualization

It is necessary to further reduce the dimension of the extracted features so as to intuitively display the results of clustering and feature extraction.Local linear embedding (LLE)is a common method to solve the regularization problem.However,when the number of neighborhood is more than the number of input dimensions,each local neighborhood matrix is not full rank.The modified locally linear embedding (MLLE)algorithm solves this problem and the clustering effect is better.The variance of dimension reduction to different dimensions is shown in Table 4.

It can be seen from Table 4 that the data variance is the smallest when the 35-dimensional features are reduced to 3-dimension features,so the feature set is reduced to 3 dimensions.The states of the turnout visualized by MLLE are shown in Fig.8.

Table 4 Variance of MLLE in different dimensions

Fig.8 MLLE visualization of each state of switch

As shown in Fig.8,F0-F7 correspond to normal and seven fault states respectively,whileF8-F12 correspond to five degradation states.It can be seen from the firgure that except for a few states overlapped by a small amount,other states are more clearly separated,which further proves the effect of feature extraction and selection.However,the distances between the clusters of the five degradation states are very close since they are all in the early degradation stage.As can be seen in Fig.7,most of the degradation processes occur in the switch transition stage and the degradation in other stages is basically the same.

3.3.2 Analysis of Pearson correlation coefficient

Although visualization can simply and intuitively reflect the correlation between states to a certain extent,the relationship between data cannot be established accurately.Pearson correlation coefficient can be used to measurethe correlation between signals.Compared with Euclidean distance method,the calculation of Pearson correlation coefficient has no requirements on the value range of different variables.Besides,it is more suitable for high-dimensional data.The correlation strength represented by Pearson correlation coefficient is shown in Table 5.

28. Offered the little man all the riches in her kingdom: Here lies some of the greatest irony of the story. The queen now has many treasures to offer in payment for the services she received. However, nothing is now good enough except for her child. The manikin has always asked for her most precious possessions, previously even when those possessions were of small wordly value and now her child which is priceless.Return to place in story.

Table 5 Pearson correlation coefficient

When calculating the correlation of the turnout power curve,the similarity of the turnout curve is also calculated by selecting 175 data points.The calculation method of Pearson is shown in Eq.(5),whereX={x1,x2,…,xn} comes from the data of 8 normal and fault states andY={y1,y2,…,yn} comes from the data of 5 degradation states.

(5)

The Pearson coefficients between the normal state,fault state and five non-fault states of the switch are calculated respectively and the average value of the total results is taken.The confusion matrix is used to visually display the correlation coefficients between each other in Fig.9.

Fig.9 Confusion matrix of state association

As shown in Fig.9,F0-F7 correspond to normal and seven failure states respectively and F8-F12 correspond to five degradation states.There is a strong correlation between the five degradation states and the normal state of the switch,indicating that the five degradation states are in the primary degradation stage.Degradation state 3 is between normal state and fault state and is strongly related to fault 1,and then it is likely to deteriorate to fault 1.Besides,it has strong correlation with fault 5,which indicates that fault 1 and fault 5 have certain similarity in the degradation process.With the increase of switch moving times,fault 7 is likely to occur since degradation state 5 is strongly related to fault 7.

To sum up,maintenance personnel should pay attention to it and carry out maintenance in advance if there are two degradation states of turnout 3 and 5.Other degradation states,such as degradation 1,2 and 4 also have medium strength correlation with fault status,which should be paid attention to in the next step.

According to the above analysis and Table 5,the degradation state is divided into five levels according to the degree of correlation between degradation state and fault state,corresponding to three maintenance levels respectively,as shown in Table 6.

Table 6 Pearson correlation coefficients

Through calculating Pearson coefficients,the relationship between switch degradation state and fault is found,which lays the foundation for the next switch state prediction.

4 Experimental verification

4.1 Establishment of CNN-GRU model

The CNN-GRU model uses python compilation platform and is built under Keras framework.As shown in Fig.10,the model includes convolution layer,pooling layer,GRU layer and dense layer.

In Fig.10,Conv1D is used to extract the original power curve characteristics of the turnout,and its parameters are as follows:Conv1D(filters=3,kernel_size=38,padding=‘valid’,activation=‘relu’,strides=2),dropout =0.3.Then the GRU prediction network is established and its parameters are as follows:GRU(units=13,return_sequences=True),loss=‘sparse_categorical_cro-ssentropy’,optimizer=‘adam’,epochs=100,dropout=0.2.

Fig.10 CNN-GRU model

The model outputs 13 nodes,which are the probability corresponding to each state of the turnout.The category corresponding to the highest value of the judgment probability is the current state of the turnout.If it is judged to be a degradation state,the state of the turnout can be predicted by the method in Section 3.3,and the possible failures of the turnout in the next stage can be judged and the corresponding maintenance opinions will be given in advance.

4.2 Result analysis

Taking the single switch of S700K type switch machine as the research object,15 groups of samples are taken for normal and each fault state,and 487 groups of degradation state data are taken.After random sorting,400 groups of data are taken for training and the remaining 207 groups of samples are taken for testing to verify the accuracy of switch state and prediction model.At 50 and 70 iterations,CNN extracts feature matrixFwith different dimensions.The simulation results are shown in Fig.11.

It can be seen from Fig.11 that the model has a better accuracy when the dimension of the feature matrixFis 40 and the number of iterations is 50 and 70.Therefore,the number of iterations of the model is set to 40.

Fig.11 Comparison of accuracy of feature matrix with different dimensions

After the dimension of the characteristic matrix is determined,the accuracy rate and the change of loss function value are obtained by changing the number of model iterations,as shown in Figs.12 and 13 respectively.

Fig.12 Accuracy comparison of different iteration times

Fig.13 Comparison of loss function values for different iteration times

Considering the accuracy of the model,the value of the loss function and the time used,the model has a satisfactory result when the number of iterations is 50.At this time,the accuracy rate is 94.2%,and the loss function value is 0.149.Whenthe number of iterations exceeds 75 times,the model accuracy and loss function value change slightly but increase the time cost.Therefore,the model chooses 50 iterations to better meet the on-site maintenance requirements.The final test result is shown in Fig.14.

Fig.14 Prediction results of test set sample

In the test prediction result,an error result is that the fault 3 is identified as the fault 2.It can be found from the fault curve that the curves of the fault 2 and the fault 3 are similar overall,which is the main cause of the model identification error.The other is that the normal state is recognized as non-fault 2.It may be that the two turnout state curves are similar and the number of normal samples and the number of degradation samples are imbalanced,which leads to a judgment error in the model.

In order to compare and verify the prediction effect of GRU,we choose two commonly used classifiers for comparison,which are reflected in two aspects:prediction accuracy and prediction time as shown in Table 7.The prediction accuracy is represented by mean square error (MSE).

Table 7 Comparison of different prediction models

It can be seen from Table 7 that the CNN-GRU model is superior to the other two commonly used classification models in terms of prediction accuracy and time,which further proves the advantages of GRU’s unique gate structure and processing time series characteristics in turnout fault prediction.

5 Conclusions

1)This study establishes the correlation between the degradation state of the turnout and the fault state through the clustering algorithm and the Pearson correlation coefficient,which provides a new idea for the prediction of the turnout failure.

2)The CNN can automatically extract the feature and reduce the dimension of the original data of the switch,which simplifies the process of prediction.

3)Compared with traditional classification networks,GRU has certain advantages in turnout fault prediction with its unique gate structure and time series processing characteristics.Through simulation experiments,the accuracy of the CNN-GRU turnout state prediction model reaches 94.2% and the method is feasible to meet the railway on-site maintenance requirements.

Journal of Measurement Science and Instrumentation

2021年3期