Fault diagnosis for on-board equipment of train control system based on CNN and PSO-SVM hybrid model

2022-11-28LURenjieLINHaixiangXULiLURanZHAOZhengxiangBAIWansheng

Journal of Measurement Science and Instrumentation 2022年4期

LU Renjie， LIN Haixiang， XU Li， LU Ran， ZHAO Zhengxiang， BAI Wansheng

(School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China)

Abstract： Rapid and precise location of the faults of on-board equipment of train control system is a significant factor to ensure reliable train operation. Text data of the fault tracking table of on-board equipment are taken as samples, and an on-board equipment fault diagnosis model is designed based on the combination of convolutional neural network (CNN) and particle swarm optimization-support vector machines (PSO-SVM). Due to the characteristics of high dimensionality and sparseness of fault text data, CNN is used to achieve feature extraction. In order to decrease the influence of the imbalance of the fault sample data category on the classification accuracy, the PSO-SVM algorithm is introduced. The fully connected classification part of CNN is replaced by PSO-SVM, the extracted features are classified precisely, and the intelligent diagnosis of on-board equipment fault is implemented. According to the test analysis of the fault text data of on-board equipment recorded by a railway bureau and comparison with other models, the experimental results indicate that this model can obviously upgrade the evaluation indexes and can be used as an effective model for fault diagnosis for on-board equipment.

Key words： on-board equipment; fault diagnosis; convolutional neural network (CNN); unbalanced text data; particle swarm optimization-support vector machines (PSO-SVM)

0 Introduction

On-board equipment of train control system is a key component of Chinese train control system (CTCS), which has a significant impact on driving safety and operation order. In the daily operation and maintenance of on-board equipment, a large amount of fault text data has been accumulated. These data include unstructured descriptive data such as the fault phenomenon, fault causes and maintenance methods of on-board equipment. Currently, fault diagnosis task still mainly relies on field personnel, and the results may be arbitrary and inaccurate. Therefore, in order to obtain the automatic classifier of the fault text record, and realize the intelligent diagnosis of on-board equipment fault, it is urgent to adopt intelligent methods to mine the pattern relationship between the fault record and its corresponding fault equipment category based on the fault text data. The significance of this research is not only to provide technical support for the fault diagnosis and maintenance of on-board equipment, but also to reduce the intensity of manual text processing.

The key to realize the intelligent fault diagnosis of on-board equipment lies in the feature extraction of fault data and the diagnosis and classification of fault. Vector space model (VSM)[1]is established to realize the structural transformation of event log of on-board equipment, but it has serious data sparse problem, and cannot express the semantic features of fault text. Term frequency-inverse document frequency (TF-IDF) algorithm[2]is used to vectorize fault text data of on-board equipment, but the comprehension of the text cannot only be based on word frequency statistics. Fault classification is realized through Bayesian network based on manually recorded on-board equipment fault tracking table[3], but it is difficult to satisfy the conditional independence between feature attributes. Hidden Markov model (HMM) is used to extract information from the fault sentences in the event log of on-board equipment and made corresponding fault analysis[4], but it is not strong enough for HMM’s power to dig deep features of text. In summary, these studies mainly rely on manual feature selection and focus on shallow classification models. The effect to automatically extract and organize information from data is unsatisfactory. In addition, it is usually assumed that the data categories are balanced when performing fault classification, which tends to cause a low fault recognition rate for small categories, and it is difficult to achieve the ideal diagnosis purpose.

In recent years, convolutional neural networks (CNN) have been used in the feature extraction and classification of fault text in the railway field[5-6]. The deep learning model relies on its own deep model structure, focusing on the extraction of hidden features and high-dimensional features[7], which can further automatically extract semantic features on the basis of text vectors. TextCNN[8]can extract text features through multiple convolution kernels, and realize classification with fully connected neural network. CNN can mine the local and global features of the text to enhance the expressive ability of the original input data, but CNN is very sensitive to unbalanced data[9]. Directly using the fully connected layer to classify unbalanced sample data easily leads to high classification accuracy for sample of majority categories, and the classification effect of minority samples is not good, which affects the overall classification effect.

Therefore, the characteristics of the fault text data of on-board equipment of train control system is analyzed, and the feature extraction ability of CNN, the classification ability of support vector machines (SVM) and the optimization ability of the particle swarm optimization (PSO) algorithm are combined to realize fault diagnosis of on-board equipment in this paper. Firstly, the feature extraction of the fault text data of on-board equipment is realized through CNN model. Then the extracted features are used as the input of the subsequent classification model. In order to solve the problem of unbalanced distribution of the fault sample category, PSO algorithm is used to optimize the classification parameters of SVM. Finally, the PSO-SVM algorithm is used to realize the diagnosis and classification of the on-board equipment fault.

1 Background

1.1 Structural composition of on-board equipment

The structure composition of on-board equipment of CTCS-3 train control system is shown in Fig.1. There are currently five types of on-board equipment used by Chinese railway[10], among which the research object of this paper is CTCS3-300T.

Fig.1 Structural composition of on-board equipment

1.2 Fault types of on-board equipment

In reference to the training materials for high-speed railway managers and professional technicians and related literature[11], the common fault types of on-board equipment are summarized, as shown in Table 1.

Table 1 Fault types of on-board equipment

1.3 Fault text data of on-board equipment

In this paper, the research data is from the fault tracking table recorded by on-site staff. Since the fault tracing table contains a lot of information, the key information is shown in Table 2.

Table 2 shows the description of the fault phenomenon and the corresponding fault type. Due to the lack of a unified recording standard, and the high dimensional and sparse nature of the fault text, it is difficult to recognize and process text data for computers.

2 Fault diagnosis of on-board equipment based on CNN/PSO-SVM

2.1 Design of fault diagnosis model

The overall architecture design of fault diagnosis for on-board equipment based on CNN/PSO-SVM model is shown in Fig.2. It is mainly composed of three parts. First, the structured processing of the fault text is completed, and the result can be converted into vector form that can be recognized by the computer. In the second part, the samples obtained after data processing are trained by CNN model, and the feature vectors corresponding to each sample are automatically extracted. Finally, the extracted feature vectors are sent to the PSO-SVM classifier to realize intelligent diagnosis of faults.

Fig.2 Overall structure of CNN/PSO-SVM diagnosis model

2.2 Automatic processing of fault text data of on-board equipment

2.2.1 Text preprocessing

Text preprocessing includes word segmentation and stop words removal. Exact word segmentation of text is a necessary condition to realize various functions of text mining. Word segmentation divides unstructured data such as natural language text into multiple information blocks, each of which can be regarded as countable discrete element. Removing stop words is to delete some prepositions, adverbs and form words, etc, which is not strong in characterizing the fault text data of on-board equipment.

2.2.2 Distributed representation of fault text

In this paper, the word2vec tool is used to convert each word in the fault text of on-board equipment into distributed word vector, which can be recognized by computer. Words in the same context are also similar in semantics, which can be represented by cosine distance. Word2vec contains two training models[12-13]. The CBOW model is selected based on hierarchical softmax, and the language model is constructed by predicting the current word based on context word. In the output layer, a Huffman tree is constructed, which uses the words in the data set as leaf nodes, and the number of times each word appears in the data set as the weight. The optimization function of the CBOW model is

L=∑t∈Ulogp(t|Context(t)),

(1)

whereUis corpus;pis probability function;tis current word; Context(t) is the context of the current word.

After the training of CBOW model is completed, a series of fixed-length vector representations containing word contextual semantic information are obtained, which preserves the semantic feature of fault text data to the greatest extent.

2.3 Feature extraction of fault text of on-board equipment based on CNN

CNN has the advantages of weight sharing and local perception, which is conducive to extracting the feature information of the fault text. The specific design of CNN model is as 1)-4).

1) Design of input layer

In the input layer, suppose that the length of a sentence in the inputted fault text data of on-board equipment ism, andxi(xi∈Rn, the dimension of the word vector isn) is the word vector of thei-th word, the sentence can be expressed as

S1:m=x1⊕x2⊕…⊕xm,

(2)

where ⨁ represents the connection operation.

2) Design of convolutional layer

In the convolutional layer, a convolution operation is performed on the text matrixS1:mthrough the convolution kernelk∈Rh×n(his the height of the convolution kernel window,nis the width of convolution kernel). The result of the convolution operation is used as the input of the non-linear activation function, and the high-level features of the local context are extracted by activation function. Then the feature map is output, it can be expressed as

M=f(k·S1:m+bi),

(3)

wherefis nonlinear activation function; “·” represents convolution operation; andbiis a bias term, whose value can be automatically adjusted with model training.

3) Design of pooling layer

In the pooling layer, the dimensionality of the convolved data is further reduced. Maximum pooling and average pooling are two main methods of pooling. In this paper, the former is used. The feature map after pooling operation can be expressed as

(4)

4) Design of fully connected and output layer

The fully connected layer integrates the previously obtained local features at a higher level and obtains global feature vectors. The classification result is reflected by the probability value between 0 and 1.

After the training of CNN model is completed, the input fault text data can be automatically extracted through the convolutional layer and the pooling layer to complete automatic extraction of feature vectors.

2.4 Fault diagnosis for on-board equipment based on PSO-SVM

The disadvantage of CNN is that it is very sensitive to the imbalance of sample data. If CNN is used directly to classify the feature vector of fault text data, the influence of minority categories on the classification will be ignored and it will be incorrectly diagnosed as the fault type of majority categories. In order to make up for the disadvantage of CNN, SVM is used as the base classifier, and the PSO algorithm is used to optimize the parameters of SVM. The specific step can be seen in Fig.2. The fully connected classification part of the trained CNN is replaced with PSO-SVM. The convolutional layer and pooling layer in CNN automatically extract features from the fault text, and these features are used as the input of PSO-SVM. The classification is realized through the PSO-SVM algorithm.

2.4.1 Design of SVM algorithm

Binary classification problem is taken as an example, the classification plane is shown in Fig.3. The principle of SVM algorithm is to obtain an optimal classification hyperplane, which makes the sum of the distance from all positive classification points to the plane and the distance from all negative classification points to the plane is maximized.

In this paper, the process of SVM algorithm for solving nonlinear classification problems is shown as 1)-4).

1) First of all, the process of finding the optimal classification hyperplane is transformed into solving equation

s.t.yi(w·xi+b)≥1-ξi,

ξi≥0 (i=1,2,…,m),

(5)

wherewis weight vector;Cis the penalty factor;bis bias;ξis slack variable.

By introducing the Lagrange multiplier method, Eq.(5) can be transformed into an unconstrained objective equation, that is

(6)

whereαandμare Lagrange multipliers.

According to Lagrange duality, the problem is transformed into solving Eq.(7). Eq.(8) can be got by calculating the partial derivative of Eq.(7).

(7)

(8)

2) Secondly, by introducing the sequence minimum optimization algorithm to solve the optimal solutionα*, the weight vector can be obtained by

(9)

Then a positive component ofα*(0<αi*

(10)

3) According to Eqs.(9) and (10), the separating hyperplane can be got, and its expression is

w*·x+b*=0.

(11)

4) Finally, the classification decision function can be expressed as

(12)

In this paper, radial basis function (RBF) is selected as the kernel function of SVM. RBF has a strong ability to analyze high-dimensional nonlinear data, and its mathematical expression is given by

(13)

In the construction of the text classifier, the “one-to-one” training method is used. If there are samples ofFcategories, the number of SVMs designed isF(F-1)/2. In this paper,F=8, so the number of trained SVMs is 28. When classifying an unknown sample, the category is eventually got with the most votes as the unknown sample.

2.4.2 Optimization of SVM algorithm

The PSO method[14-15]assumes that there aremparticles in theD-dimensional search space, the position vector of thei-th particle is expressed asXi=(xi1,xi2,… ,xiD), and its flight speed vector can be expressed asVi=(vi1,vi2, …,viD). The optimal position found by thei-th particle is expressed asPi=(pi1,pi2, …,piD), and the optimal position found by the entire particle swarm can be expressed asPgbest=(pgbest1,pgbest2, …,pgbestD). The speed and position update formulas are given by

(14)

(15)

wherekrepresents the number of iteration steps;drepresents a certain dimension inD-dimensional search space;wis inertia weight;c1andc2are constants called acceleration factors;r1andr2are random constants distributed between 0 and 1.

In this paper, the PSO algorithm is used to find the optimalCandgin the group. The specific implementation process of optimization of SVM parameters is shown in Fig.4.

Fig.4 Process of optimization of SVM parameters

3 Fault diagnosis experiment and analysis of on-board equipment

3.1 Experimental environment and data

Fig.5 shows the distribution of on-board equipment fault from 2016 to 2019 in the signaling depot of a railway bureau. In this paper, 85% of the 1 130 fault text data of on-board equipment is taken as training set, and the rest as test set.

Fig.5 Distribution of fault samples of on-board equipment

The experimental environment and configuration are shown in Table 3.

Table 3 Experimental environment and configuration

3.2 Evaluation indexs of algorithm

In this paper, three indicators are selected to evaluate and compare the proposed models. The three indicators are precision rate (P), recall rate (R) andF1value. These indicators are determined by the parameters in Table 4.

Table 4 Parameters of evaluation indexs

The precision rate and recall rate can reflect the diagnosis classification results from the two perspectives of precise and complete, respectively, and their formulas are respectively given as

P=TP/(TP+FP),

(16)

R=TP/(TP+FN).

(17)

F1value is the harmonic average of the first two indicators, which can be expressed as

(18)

3.3 Model training

3.3.1 Experiment of word vector training

In this paper, the window length of the CBOW model is set to 3. Word vectors of different dimensions are set to verify on the fault text data set. The change ofF1with different word vector dimensions is shown in Fig.6. It can be seen that when the dimension of word vector is 200 to 300,F1tends to be stable, so the dimension of word vector is set to 200.

3.3.2 Training of CNN feature extractor

Considering that more text features can be extracted under the interaction of multi-size convolution kernels, the size of convolution kernel is 3, 4 and 5, respectively. The number of each convolution kernel is 128 and the dimension is 200. ReLU is used as the activation function after convolution operation. The batch parameter is set to 64, the values of the learning rate and the dropout rate are 0.001 and 0.5, respectively.

Fig.6 F1 corresponding to word vectors of different dimensions

By increasing the number of iterations in the experiment, the change of the recognition error with the number of iterations is shown as Fig.7. It can be seen that when the iteration reaches 130 times, the training error and test error both reach a small value, which indicates that the model is fully trained and the experiment achieves a good training effect.

Fig.7 Change of recognition error with number of iterations

3.3.3 Training of PSO-SVM model

The PSO algorithm is used to determine the penalty factorCand the kernel parametergin the SVM model. The setting situation of relevant parameters in the PSO algorithm is shown in Table 5.

Table 5 Parameters of PSO algorithm

The fitness curves of the optimization of PSO algorithm to SVM are shown in Fig.8. It can be seen that when the iteration reaches the 27th generation, the best fitness curve reaches the maximum convergence accuracy. After the training is completed, the combination of optimal parameter (C,g) = (12.323 9, 0.356 2) is obtained.

Fig.8 Fitness curve of optimization of PSO to SVM

3.4 Performance verification of CNN/PSO-SVM hybrid model

According to the description in Table 4, the confusion matrix is used to visually display the judgment result of each fault category. The confusion matrix of the CNN model and the CNN/PSO-SVM model are shown in Figs.9 and 10, respectively. The numbers 1 to 8 in the abscissa and ordinate of confusion matrix represent theF1-F8 fault categories of on-board equipment.

Fig.9 Confusion matrix of CNN model

Through the comparison of Figs.9 and 10, it can be found that due to the imbalance of the fault sample categories, the small category samples are easy to be misdiagnosed as the large category samples in the results of fault diagnosis of on-board equipment. Compared with the CNN model, the CNN/PSO-SVM model has made certain improvement in the fault recognition rate of small fault categories.

Fig.10 Confusion matrix of CNN/PSO-SVM model

3.5 Results comparison of different fault diagnosis models

In order to further verify the fault diagnosis performance of the CNN/PSO-SVM model, it is compared with other classification models. Experiments are mainly divided into two categories. Firstly, considering that k-nearest neighbors (KNN), naive bayes (NB) and SVM are classification models in machine learning algorithms, which have achieved good results in fault text classification tasks. Therefore, in the first type of experiment, these three models are compared with the CNN/PSO-SVM model. The three models all use the word vectors that have not been extracted features through CNN as the input of each model. Secondly, two types of coping strategies are proposed for the problem of data imbalance, namely, reconstructing the data set from the level of the data itself and from the perspective of algorithm improvement[16]. The problem of imbalance of fault text data is solved from the perspective of the improvement of algorithm parameters. Therefore, in the second type of experiment, data-oriented methods are selected as the comparison object, including the combination of random undersampling (RU) and synthetic minority over-sampling technique (SMOTE) algorithm with CNN, respectively. In addition, the separate CNN model is used to complete feature extraction and classification task. To sum up, the performance comparison of different fault diagnosis models is shown in Table 6.

According to the diagnosis results in Table 6, it can be drawn that among the three models that take word vectors without feature extraction as input, the recall rates of the first two models are relatively low. The recall effect is not good, which affects the overall diagnostic classification performance. SVM reflects the characteristic of good generalization performance, and the classification effect is improved. Then, particle swarm optimization is introduced into the most effective SVM and the PSO-SVM model is constructed. Compared with the SVM model, the recall rate of this model is increased by 5.62%, which shows that the introduction of particle swarm optimization can effectively improve the classifier’s ability to process the unbalanced text data. In the second type of experiment, firstly, the separate CNN mode is used to extract and classify word vector features. However, CNN is more sensitive to the problem of data imbalance and the classification effect is not good. Secondly, from the perspective of reconstructing the data set, the data sampled by RU and SMOTE is input into CNN, and then feature extraction and classification are completed. Both methods reduce the imbalance between the data, their advantages are conducive to the learning of the classifier, and the classification effect of CNN is thus improved. But the defects of the sampling method are large randomness and unstable experimental results. Moreover, for text data, the interpretability of the new data generated by the oversampling method is not strong. If there is a small interference, the meaning of the entire sentence and the effect of feature extraction will be affected.

Table 6 Comparison of various fault diagnosis models

The performance of the CNN/PSO-SVM model on various indicators is better than the other models. The reason for this effect can be summarized as the CNN/PSO-SVM model takes advantage of the ability of CNN to automatically extract data features, and its unique convolution and pooling methods can more effectively extract the deep semantic information of the fault text of on-board equipment. At the same time, particle swarm optimization is introduced into the SVM classifier to improve the diagnostic performance of the classifier on unbalanced text data, which has a certain improvement effect on the diagnosis and classification of on-board equipment.

4 Conclusions

1) Based on the fault text data of on-board equipment, a fault diagnosis model is proposed based on the combination of CNN and particle swarm optimization SVM. Through word segmentation of the fault text of on-board equipment and the training of the CBOW model, the words in the text data are converted into word vectors to realize the vectorization of the fault text data.

2) The convolution and pooling operations of CNN are used to automatically extract the feature of fault text data of on-board equipment, the unbalanced data is processed by the PSO-SVM model, and the extracted features are classified and output.

3) By conducting experiments on the fault text data of on-board equipment of a railway bureau, the evaluation of each model through three indicators shows that the CNN/PSO-SVM model can further improve the performance of the fault diagnosis model and is an effective fault diagnosis model for on-board equipment.

Journal of Measurement Science and Instrumentation

2022年4期