APP下载

Anti‐noise diesel engine misfire diagnosis using a multi‐scale CNN‐LSTM neural network with denoising module

2023-12-01ChengjinQinYanruiJinZhinanZhangHongganYuJianfengTaoHaoSunChengliangLiu

Chengjin Qin | Yanrui Jin | Zhinan Zhang | Honggan Yu | Jianfeng Tao |Hao Sun | Chengliang Liu

State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering,Shanghai Jiao Tong University, Shanghai,China

Abstract Currently, accuracy of existing diesel engine fault diagnosis methods under strong noise and generalisation performance between different noise levels are still limited.A novel multi-scale CNN-LSTM neural network (MSCNN-LSTMNet) is proposed with a residual-CNN denoising module for anti-noise diesel engine misfire diagnosis.First, a residual-CNN module is designed for denoising the original vibration signal measured from the diesel engine cylinder and residual loss for constructing a new loss function is utilised.Considering the essential characteristics of measured vibration signals at different scales, a multi-scale convolutional NN (CNN) block is designed to realize multi-scale feature extraction.Specifically, multiple convolution layers with different branches and different convolution kernel sizes are utilised to extract different time scales features, enhancing the robustness of the model.On this basis, the LSTM is utilised to further extract sequential features for improving anti-noise and generalisation performances.The effectiveness of MSCNN-LSTMNet is validated by experimental results of both one- and hybrid-cylinder diesel engine misfires diagnosis under various noise levels and working conditions.The results demonstrate that MSCNNLSTMNet achieved much better anti-noise and generalisation performances than the existing methods.Under strong noise conditions (-10 dB signal-to-noise ratio) for four datasets, MSCNN-LSTMNet obtained 97.561% average accuracy, while average accuracy for random forest, deep neural network, CNN and MSCNNNet were 73.828%, 79.544%, 82.247%, and 89.741%, respectively.Moreover, for 11 noise generalisation tasks between different noise levels, MSCNN-LSTMNet obtained at least 96.679%, 97.849%, 98.892%, and 94.010% accuracy on the four datasets, which are much higher than those of the existing methods.

K E Y W O R D S fault diagnosis, machine learning, neural network

1 | INTRODUCTION

Multi-cylinder diesel engine works smoothly and could obtain enough power.Compared with gasoline engine, diesel engine has obvious advantages, including long service life, economy and durability, low speed and high torque, high safety and environmental protection[1].Consequently,it has been widely used in engineering machinery, automobile industry, ship machinery, power industry, agricultural machinery, and other fields.As a common fault of diesel engine, misfire is mainly caused by electronic control system fault and mechanical part fault.Electronic control system failure,including sensor signal loss or inaccuracy, control unit control signal failure or no signal output, ignition failure caused by spark plug or ignition coil damage, injector injection failure, and circuit connection failure.And the mechanical failure is mainly due to insufficient cylinder pressure, such as tight valve closing, leakage, and so on.The occurrence of misfire will lead to serious vibration,insufficient power, weak acceleration, and high fuel consumption [2].Therefore, it is of vital significance to monitor engine running state on-line and take corresponding measures[3–5].

So far, efforts have been used to propose physical modelbased misfire diagnostic methods and have made some achievements[6,7,8].The core idea of this kind of method is to diagnose and trace the source of fault by analysing the difference between actual characteristics and the ideal model[9-11].Based on the engine dynamic model, Wang and coworkers [12] introduced a new misfire detection method by measuring crank angular speed.With the aid of sliding mode observer (SMO), it obtained good robustness under various working conditions.However, its detection performance depended on dynamic model accuracy.From the perspective of energy, Tinaut and co-workers [13] suggested a misfire and compression fault monitoring approach using instantaneous angular speed, in which two robust energy indices were constructed.However,it ignored potential energy of clutch spring and was only suitable for high angular speed scenes.Based on Kalman filter (KF), Helm et al.[14] developed a multiple model approach for misfire detection, in which one dedicated KF represents normal operation and the other represents misfire.Employing the estimated torque, Jung et al.[15] recommended a model-based misfire monitoring approach.Its performance depended on model accuracy and will degrade under complex conditions.Based on the established crankshaft dynamic model, Li and co-workers [16] developed a discrete SMO for estimating the indicated torque using crankshaft speed fluctuation.However,model accuracy and simplification are often contradictory.Overall, the obvious advantage of the physical model-based method is that it goes deep into the essence of dynamic system and carries out the fault diagnosis in real time.However,this kind of methods needs to establish accurate mathematical models and depends on domain knowledge,which limits their practical applications in complex systems under complex working conditions [17, 18, 19].

From another perspective, researchers are also committed to developing data-driven diesel engine misfire diagnosis methods.This type of methods commonly utilises statistical analysis and signal processing for extracting features and builds classifiers based on the neural network (NN) to carry out diagnostic tasks.With the aid of engine speed signal under steady-state working conditions, Hu and co-workers [20]developed a multiple misfire diagnosis algorithm for internal combustion (IC) engine, in which characteristic vectors were extracted by a statistical analysis method, and principal component analysis was employed for misfire diagnosis.However, it needed to determine the threshold and was only applicable for steady-state conditions.After de-noising vibration signal adaptive wavelet packet(WP),Wang et al.[21] presented a diesel engine fault detection algorithm combining fractal dimension and ensemble empirical mode decomposition,which avoids the classifier design.However,the model required complex preprocessing, feature extraction, and threshold determination and was not competent for strong noise scenarios.By making full use of simulation and experimental data,Chen and co-workers [22] recommended a misfire diagnosis algorithm based on probabilistic NN and multi-Layer perceptron network for IC engines.However, the model obtained based on the simulation data is not suitable for practical scenarios, such as strong noise and complex working conditions.Based on the selected parameters, Liu and co-workers [23]constructed a propagation NN for engine misfire detection.However, when the training information was insufficient, the model accuracy decreased.Wu and Liu [24] recommended a regression NN-based fault diagnosis algorithm for IC engines,in which WP was employed for feature extraction.However,this method required complex preprocessing and feature extraction,which was difficult to apply to the diagnosis between strong noise conditions and different noise levels.Devasenapati and co-workers [25] introduced a decision tree-based misfire classification model using statistical features of vibration signals.However, the model performance was affected by dimension,minimum number of objects, and confidence factor.And the model has poor applicability, making it difficult to apply to complex operating conditions and other scenarios.Based on extracted features via WP,Vong and co-worker[26]suggested a support vector machine (SVM) classification algorithm for ignition fault.In Ref.[27], several misfire classifiers based on machine learning(ML)were investigated and compared,which showed that multiclass classifier achieved the best misfire detection performance using extracted statistical features.By using statistical features via intrinsic time-scale decomposition,Liu and co-workers[28]suggested an engine fault classification algorithm based on the relevance vector machine.However,the model had limited performance under complex conditions.In Ref.[29], several decision tree-based misfire classification algorithms using statistical features were explored and discussed,where the linear model tree was recommended.Jafarian and coworkers [30] employed Fourier Transform for extracting features and investigated engine fault detection performances ofkNearest Neighbour, Artificial NN, and SVM.Using sound signals,Singh et al.[31]proposed an SVM-based identification approach for engine misfire.However, the acoustic signal was susceptible to environmental noise,and the diagnosis between strong noise and different noise levels was not addressed.Hou and co-workers [32] suggested an integrated identification algorithm based on multilayer perceptron and genetic algorithm.However,the algorithm was time-consuming to obtain the best individual and did not consider complex scenes with strong noise.By constructing an indicator depicting vibration intensity,Syta et al.[33] introduced the signal analysis-based misfire identification algorithm for aircraft engine.However,it required complex signal processing and was difficult to apply to strong noise and inter-noise diagnostics due to limited tests under specific operating conditions.For identifying engine faults, Xi et al.[34] proposed a manifold learning-based approach to extract features from vibration signals.However,the algorithm required an optimal reference signal, which was difficult to implement in practical scenarios.Wang et al.[35] developed time–frequency analysis-based fault identification algorithm for diesel engines.However, the algorithm relied on complex signal processing, which makes the model performance degrade under strong noise conditions.To avoid handcrafted feature extraction, Zhang et al.[36] proposed a convolutional NN (CNN)-based misfire identification approach for diesel engines.It had been proved to obtain relatively high accuracy under steady-state conditions.Qin and co-workers [37]constructed a twin CNN for misfire identification, in which multi-domain and handcrafted features were employed.To monitor multiple irregularities, Shahid et al.[38] developed a novel CNN-based identification approach for both misfire and load changes using the crank angle degree signal.However, it was unable to determine which cylinder was misfire, and the model performance under strong noise and the generalisation performance between different noise levels were limited.

An in-depth literature analysis shows that the above datadriven approaches commonly require complex feature extraction procedures.Usually, these approaches employ timefrequency analysis algorithms and statistical analysis algorithms to extract manual features in time domain, frequency domain, and time-frequency domain.Then, handcrafted features are fed in into classifiers using traditional ML.Consequently, the quality of feature mining and learning ability of traditional ML-based classifiers to complex problems have an important impact on the performance of misfire monitoring model.However, in the practical application of diesel engine,the measurement of state quantity is often disturbed by strong vibration and noise, which makes high-quality feature extraction more challenging.Since these approaches rely on the quality of manual feature mining, their performances are difficult to guarantee under strong vibration and noise scenarios.Meanwhile, discrepancy between training and test set caused by the noise level and the working condition variation will also degrade misfire classification accuracy.For deep learning (DL)-based misfire identification models, they commonly ignore interferences of various factors on the measurement signal(i.e.,noise),which limits their performance in an actual working scenario.In addition, diagnostic accuracy of these models trained under certain noise level decreases sharply under other noise levels.Consequently, efforts still need to focus on further enhancing the performance of engine misfire detection under strong noise conditions as well as the generalisation performance between different noise levels.

Consequently, to address above problems, this paper proposes a novel multi-scale CNN-LSTM NN (MSCNNLSTMNet) for anti-noise diesel engine misfire diagnosis.To denoising the original vibration signal measured from the diesel engine cylinder, it designs a residual-CNN block based on the noise superposition principle.Simultaneously,it utilises residual loss to construct a new loss function for model training.Then,a novel multi-scale CNN block is designed for extracting essential features of measured vibration signals at different scales.Specifically, multiple convolution layers with different branches and different convolution kernel sizes are utilised to extract different scale features for enhancing the robustness of the model to the learning of misfire fault features.Moreover,the LSTM are utilised to further extract sequential features for improving anti-noise and generalisation performances.The anti-noise and generalisation performances of MSCNNLSTMNet are validated by comprehensive comparisons with existing methods under different noise environment and operating conditions.

The main contributions and innovations of this paper are as follows:

1) In this paper, we present a novel multi-scale CNN-LSTM NN (MSCNN-LSTMNet) for anti-noise diesel engine misfire diagnosis.

2) A residual-CNN block is designed based on the noise superposition principle to denoising measured raw vibration signal,and residual loss is employed to construct a new loss function for model training.

3) A novel multi-scale CNN block is presented for extracting essential features of measured vibration signals at different scales, and LSTM is utilised to extract sequential features for further improving anti-noise and generalisation performances.

4) The results under various noise levels and working conditions validate that MSCNN-LSTMNet achieves much better anti-noise and generalisation performances compared with existing methods.

The remainder for the article is arranged as follows.In Section 2,the experimental platform is provided and MSCNNLSTMNet is presented in detail.Section 3 proposes experimental results and verification and Section 4 provides concluding remarks.

2 | MATERIALS AND METHODS

2.1 | Framework of proposed method

Figure 1 shows the flowchart for the proposed misfire monitoring approach.Engine misfire will lead to unstable engine operation, accompanied by increased vibration and noise.Therefore,the vibration signal of engine cylinder contains rich system-state information,which can be used for fault diagnosis including misfire.For these reasons, we collect vibration signals of cylinder head online for misfire identification.As a complex mechanical system, a diesel engine has large background noise and complex signal components of body surface vibration.In order to achieve accurate misfire identification,this work constructs a residual-CNN module for denoising the original signal and eliminating the influence of interference on the quality of essential feature extraction by subsequent multiscale depth networks.Then, we design a multi-scale CNN block with different branches and different convolution kernel sizes for extracting different time scale features to enhance the robustness of the network model to the learning of misfire fault features.After that, the LSTM network is utilised to further extract sequential features for improving anti-noise and generalisation performances.Finally, with the aid of multi classification softmax function, it outputs the diesel engine misfire monitoring results.

2.2 | Materials

In our research, misfire fault tests under different conditions were carried out for comprehensive verification.The whole experimental platform for diesel engine misfire evaluation is shown in Figure 2.The diesel engine 4A3LR used was a fourcylinder and four-stroke diesel engine, whose four cylinders was shown in Figure 2b.Figure 2b also showed the installation position of accelerometer.As we could see, for obtaining rich misfire fault information, it is installed on No.3 cylinder,which is close to centre for all cylinders.In this study, theaccelerometer (HD-YD-233) was employed to acquire the vibration signals of cylinder head, and its frequency range was 0.5–5000 Hz.And Figure 2c illustrates the measurement and control system.It could simulate misfire conditions of the engine running at low speed, medium speed, and high speed.In our experiments, we simulated the misfire faults under different speed conditions.That is, the low speed is set to 1300 rpm,the medium speed is set to 1800 rpm,and the high speed is set to 2200 rpm.

F I G U R E 1 Flowchart for presented MSCNN-LSTMNet.

F I G U R E 2 Experimental platform: (a) complete engine test bench, (b) diesel engine and sensor installation, (c) engine measurement and control system,and (d) data acquisition system (DAQ) system.

For multi-cylinder diesel engines, one- and two-cylinder misfire faults need to be focussed and monitored.This is because the whole engine vibrates violently when more than two cylinders are misfire simultaneously, which is easy for technicians to identify.Consequently, in this study, we mainly carried out the diagnosis task of one- or two-cylinder misfire faults.Table 1 presents complete test scheme.For one-cylinder misfire case, we mainly carried out four different misfire tests(cylinder 1 misfire, cylinder 2 misfire, cylinder 3 misfire, and cylinder 4 misfire) and normal tests under different speed conditions(1300,1800,and 2200 rpm).On the other hand,to verify the model applicability and without loss of generality,we carried out three different two-cylinder misfire tests, onecylinder misfire (i.e., cylinder 2 misfire) and normal tests under the medium speed condition (i.e., 1800 rpm).

In our data acquisition system, the vibration signal is collected by NI 9234 and further processed by NI cDAQ 9178.When collecting signals, the sampling frequency was set as 25.6 kHz.For one- and two-cylinder misfire faults under certain running speed,we collected vibration signals for at least 41 s.At this time,each working condition contains at least 900 engine running cycles.For intuitive observation and comparison, the acceleration signals for one- and two-cylinder diesel engine misfires are presented in Figure 3.It is seen that the time-domain diagrams show some differences, but it is challenging to carry out the misfire diagnosis based on the timedomain diagrams.Meanwhile, as a complex dynamic machine, the coupled vibration of diesel engine's mechanical structure is an inherent noise source, which will cause interference to signal feature extraction and misfire diagnosis.Consequently, to achieve the high-precision diesel engine misfire diagnosis, we present a novel multi-scale CNN-LSTM NN (MSCNN-LSTMNet), which possesses noise robustness and working condition adaptability.

F I G U R E 3 Acceleration signals for one- and two-cylinder diesel engine misfires.

T A B L E 1 One- and two-cylinder diesel engine misfire tests

2.3 | Residual network, CNN, and LSTM

2.3.1 | Residual network

We all know that increasing the network width and depth can improve the diagnosis and prediction performance, and the performance of the deep network is generally better than that of the shallow network.However,deep networks are prone to degradation, that is, with the increase of network layers, the accuracy on the training set is saturated or even decreased.The degradation problem shows that the deep network cannot be easily optimised.When the network degenerates, the shallow network can achieve a better training effect than the deep network.At this time, if we transfer the characteristics of the lower layer to the higher layer,the effect should be at least no worse than that of the shallow network.Based on this idea,by combing forward NN and shortcut connection, He et al.[39]developed a residual network for image recognition.

Figure 4 presents the basic architecture of residual network.In Figure 4, F(.) denotes network layers that extract features.And H(.) denotes the convolutional layer or identity function.In our work, relu activation function is selected.For a multilayer NN,the parameters are updated with back propagation.It is seen that it spans a particular network layer and connects directly from input to output.In this way, gradient could be transferred straightly to the shallow layer during training.Therefore, the weights of the deep network can be better optimised.

F I G U R E 4 The architecture of the residual network.

2.3.2 | CNN and LSTM

Owing to strong representative ability,CNN that proposed and optimised by [40, 41] has become one of most widely used neural networks [42–46, 47, 48, 49, 50].With numerous training input signals, CNN can obtain representative features automatically to achieve end-to-end learning.Thus,we use the convolutional layer as basic feature extraction methods for designing a subsequent block in this paper.

The convolutional layer realizes the dimensionality reduction and feature extraction of input data through local connection and weight sharing.The convolutional layer includes trainable filters called kernels.In general, length of the kernel could not exceed the length of the input.Without loss of generality, for kernel size 3 and stride 2, convolution operation is defined as follows:

in whichxh,kh,andyhrepresent thehth element of the input,kernel, and result, respectively.

The output for the convolutional layer should be processed by the activation function to realize complicated mapping between input data of CNN and output data,making the CNN model converge faster.The commonly used nonlinear activation function of the CNN network model is ReLU activation function, namely

The convolutional layer is commonly followed by the pooling layer.Since it is invariant to the local linear transformation of input data, the pooling layer also enhances generalisation ability.The pooling operation includes average pooling method and maximum pooling method.For fully connected layer,every neuron of these layers is fully connected with every neuron in the previous layer,which is equivalent to a traditional NN.

Recurrent neural network(RNN)is suitable for processing sequence data due to its unique closed-loop structure.Although RNN has gained many achievements in dealing with time series data, RNN cannot effectively address data with long-distance dependence.Additionally, gradient disappears easily during RNN training.To overcome these drawbacks,the LSTM network was designed in[51]and has been widely used[52–55].It is capable of effectively learning long-distance time information using the gate mechanism and internal memory unit and overcoming gradient disappearance in traditional RNN.It comprises three important door structures, and the architecture of the LSTM memory block is presented in Figure 5.The first is the forget gateft, controlling what information needs to be discarded.The second is the input gateit,controlling what new information needs to be added to the current state, while output gateotdetermines how much information will be output from the memory unit.Equations (3)–(8) present the corresponding calculation operations of these three gates.

F I G U R E 5 Convolution and pooling operations.

in whichWandUpresent weight vectors,whilebpresents bias vector.In Figure 6, tanh(x) represents the hyperbolic tangent activation function, whileσ(x) is the sigmoid activation function.ctand ~ctrepresent the state information and candidate state information for the neuron, whilehtdenotes output for the neuron.

It is seen that the memory unitctchanges according to values offt,~ctandct-1.The current outputhtis calculated from the outputotand the memory unitct.The internal structure of LSTM consists of three gates to adjust the state of the neural memory unit, which effectively solves the gradient disappearance problem of the traditional RNN model.ct-1represents long-term memory information.Therefore,LSTM possesses a strong advantage in capturing and extracting the long-term dependence of time series data and effectively using historical information for prediction.

F I G U R E 6 Memory block of LSTM.

2.4 | Residual‐CNN denoising module

In the practical application of diesel engine, the measurement of state quantity is often disturbed by strong vibration and noise, which makes high-quality feature extraction more challenging.Consequently, to improve the performance of misfire detection, efforts need to focus on noise reduction of the measured vibration signals for further feature learning.

In the past, researchers mainly adopted signal processingbased filtering methods.Generally speaking, the premise of good effect of signal processing-based filtering methods is that the assumptions made are valid in the used scenario, and the statistics used in filtering can be estimated accurately.However,these two are often not tenable or difficult to achieve in the real scenario.In recent years,DL has been introduced into the field of audio and image filtering, and its performance exceeds the signal processing-based filtering methods in multiple tasks,showing great potential.However, these methods often establish a priori knowledge model for the input signals.Hence,these methods involve complex optimization problems in the test stage, and the model is generally non-convex,involving several manually selected parameters.

Inspired by image denoising in Ref.[56],to solve the above problems and accurately filter out the noise interference, we construct a novel Residual-CNN denoising module for processing the collected vibration signals as shown in Figure 7.For the collected vibration signaly, it consists actual vibration signalxand noise.The core idea is that it trains a residual mapping functionF(y) =noisewith residual learning, then actual vibration signal can be obtained,that is,x=y-F(y).The highlights of the developed module lie in that an end-to-end Residual-CNN module is presented, employing residual learning to remove clean information from noisy information.Moreover, it is capable of dealing with ordinary information denoising tasks (i.e., blind denoising).

As shown in Figure 7, the Residual-CNN denoising module does not directly output actual vibration signals, but designs it as a prediction residual information, that is, the difference between noise observation and potentially clean information.In other words, the developed module implicitly removes the potential cleaning information through the operation in the hidden layer.To enable the module to carry out end-to-end training with subsequent networks, residual loss should be added to ensure that the module parameters are updated iteratively in the back-propagation process, namely

whereβdenotes the preset noise template.

It is noted that the proposed residual-CNN denoising module is trained with a subsequent network together.That is,they are trained simultaneously.During the whole network training process, we use cross entropy loss to characterise the classification accuracy of the model.Meanwhile, in order to further constrain the model filtering performance of the residual-CNN denoising module,we combine the cross entropy loss and the loss defined in Equation(9)to construct a new loss function.During the process of network back propagation,the proposed loss function is optimised to achieve the anti-noise performance and classification accuracy of the model.

2.5 | Multi‐scale CNN‐LSTM module

Engine misfire will lead to unstable engine operation, accompanied by increased vibration and noise.Therefore, the vibration signal of engine cylinder contains rich system-state information,which can be used for misfire fault diagnosis.The key to ensure the performance of misfire diagnosis is how to fully mine the essential characteristics of faults contained in vibration signals.In the previous section, the Residual-CNN denoising module has been designed to obtain the vibration signal as clean as possible.In this section, we focus on designing modules that can extract multi-level and multidimensional essential features.

F I G U R E 7 The proposed Residual-CNN denoising module.

F I G U R E 8 The developed multi-scale CNN-LSTM module.

Figure 8 presents the developed multi-scale CNN-LSTM module, which is capable of extracting multi-scale implicit features and sequential features from the acceleration signal.To fuse different time-scale implicit features, the developed multi-scale CNN-LSTM module skillfully designs convolution kernel sizes.That is, to learn and fuse rich implicit features of different scales at the same time, it employs multiple parallel convolution layers with different convolution kernel sizes.

As illustrated in Figure 8,the developed multi-scale CNNLSTM module has multiple branches to extract multi-scale features from the input in parallel.The output representation of the convolution layer is shown in Equation (1), where the convolution kernel sizes of different branches are different.For different convolution kernels, convolution can extract different scales of information from the original signal.On this basis, the convolution outputs of different branches are fused to obtain complementary information of different time scales.Then, to make comprehensive use of different levels of features,the output of the multi-scale CNN module is further fed into the LSTM layer to extract and fuse sequential features.In general, the low-level features of the shallow learning signal and the high-level features of the DL signal can enrich the features obtained by simply superimposing the convolution layer.Therefore, we develop the multi-scale CNN-LSTM module for fully mining and fusing the high-level intrinsic fault features.

2.6 | The architecture of MSCNN‐LSTMNet

To achieve high identification accuracy under strong interference conditions and generalisation performances, this paper proposes a novel multi-scale CNN-LSTM NN (MSCNNLSTMNet) for anti-noise diesel engine misfire diagnosis.The MSCNN-LSTMNet is mainly composed of two core modules,that is, the Residual-CNN denoising module and the multiscale CNN-LSTM module.Figure 9 presents the architecture of MSCNN-LSTMNet and Table 2 provides the details for the developed MSCNN-LSTMNet.

F I G U R E 9 The architecture of MSCNN-LSTMNet.

Vibration signals for cylinder head under various conditions are collected and fed into NN.Then, on the basis of noise superposition principle,we design a Residual-CNN block for denoising the original vibration signal measured from the diesel engine cylinder.Simultaneously, we utilise residual loss for constructing a new loss function.Considering the essential characteristics of measured vibration signals at different scales,we design the multi-scale CNN-LSTM module to realize multiscale feature extraction.Specifically, multiple convolution layers with different branches and different convolution kernel sizes are utilised to extract different time scales features to enhance the robustness of the network model to the learning of misfire fault features.On this basis,the LSTM is utilised to further extract sequential features for improving anti-noise and generalisation performances.

3 | EXPERIMENTAL RESULTS AND VERIFICATION

In our research, misfire fault tests under different speed conditions were carried out for comprehensive validation.The datasets used for validation are given in Table 3.Datasets I,II,and III were all single cylinder misfire faults, which were obtained at low speed, medium speed, and high speed (1300,1800, and 2200rpm), respectively.Dataset IV was hybrid cylinder misfire faults, including both one- and two-cylinder diesel engine misfires, and was obtained at a medium speed(1800rpm).For each label,there were 511 samples in total,and each sample has 1024 vibration series points.Eventually,every dataset has 2555 samples in total,and the ratio of training set,validation set, and test set is 8:1:1.

T A B L E 2 Details for the developed MSCNN-LSTMNet

T A B L E 3 The datasets used for validation

Meanwhile, for evaluating the accuracy, anti-noise, and generalisation performances of the proposed MSCNNLSTMNet, we conduct detailed comparisons with existing methods (including random forest [RF], deep neural network[DNN], and CNN).Among them, inputs for RF and DNN include WP energy features and nine important time-domain statistical features.In addition, the MSCNNNet is also used for comparisons.Except that there is no LSTM layer,MSCNNNet shares the same network structure and parameters with proposed MSCNN-LSTMNet.Herein, CNN and MSCNNNet employ the same input as proposed MSCNNLSTMNet, that is, original vibration signal.It is noted that we repeated all methods five times on each dataset for obtaining the average diagnosis accuracy.In this research, we employed Keras package for establishing all models.Meanwhile,the learning rate was set as 0.001 and epochs were set as 50.

3.1 | Comparisons of anti‐noise performance

Firstly, we evaluate and verify the anti-noise performance for MSCNN-LSTMNet.Therefore, we add noise with different signal-noise ratios (SNR) to the original four datasets, that is,datasets I,II,III,and IV.Generally speaking,the smaller SNR value (the unit is dB), the stronger the noise mixed in the signal.For a given signal powerPsignaland noise powerPnoise,the definition of SNR is shown in Equation (10), namely

To investigate the anti-noise performance, Gaussian white noises with different levels of SNR were added to datasets I,II,III,and IV,including SNRs(-10,-8,-6,-4,-2,0,2,4,6,8,10 dB).

Figure 10 presents comparisons of anti-noise performance on dataset I.For the 11 SNRs,RF obtained 66.406%,69.141%,69.922%, 76.562%, 75.000%, 76.562%, 75.391%, 74.609%,76.562%,79.297%,and 73.828%average accuracy,respectively.And DNN obtained 67.451%, 60.000%, 67.451%, 77.647%,73.333%, 81.176%, 81.569%, 87.059%, 86.667%, 84.314%,and 89.020% average accuracy, respectively.For CNN, it obtained 82.238%, 79.103%, 80.867%, 82.421%, 79.492%,80.474%,81.258%,82.811%,79.100%,79.494%,and 78.916%average accuracy, respectively.It is seen that the diagnostic accuracy of the three models is greatly affected by noise and is low under strong noise.For MSCNNNet, it adopts the same structure and parameters as the proposed MSCNN-LSTMNet,except for the LSTM layer.We could see that MSCNNNet achieved much better diagnostic accuracy than the above three models, which verified the effectiveness of the proposed Residual-CNN denoising module and the multi-scale CNN module.Specifically,MSCNNNet obtained 89.659%,89.067%,87.894%, 90.046%, 86.521%, 90.044%, 86.917%, 89.654%,90.622%,87.309%,and 87.896%average accuracy,respectively.Based on Figure 10, the proposed MSCNN-LSTMNet achieved the highest accuracy in all noise levels.Specifically, the proposed MSCNN-LSTMNet achieved 97.271%, 96.882%,98.246%, 98.058%, 97.858%, 98.246%, 97.469%, 98.059%,97.071%,97.667%,and 97.070%average accuracy,respectively.Consequently, the proposed MSCNN-LSTMNet achieves a good anti-noise performance, which is especially suitable for practical scenes with various degrees of noise influence.

F I G U R E 1 0 Comparisons of anti-noise performance on dataset I.

Figure 11 provides comparisons of anti-noise performance on dataset II.On this dataset, RF achieved higher accuracy than that on dataset II.Specifically, it obtained 71.094%,77.734%, 75.000%, 71.094%, 78.125%, 78.516%, 77.344%,78.906%, 74.609%, 78.906%, and 83.594% average accuracy,respectively.For DNN, it achieved 76.078%, 83.529%,83.137%, 83.529%, 89.412%, 86.275%, 87.843%, 87.843%,87.059%,90.980%,and 88.235%average accuracy,respectively.Meanwhile, CNN obtained 81.232%, 84.364%, 84.651%,86.124%, 84.959%, 85.535%, 83.281%, 84.758%, 85.242%,85.241%, and 85.344% average accuracy on dataset II,respectively.Figure 11 shows that the diagnostic accuracy of MSCNNNet and the proposed MSCNN-LSTMNet are much higher than that of RF, DNN, and CNN.Therefore, the proposed MSCNN-LSTMNet achieves a good anti-noise performance for diesel engine misfire diagnosis.When SNR equals to-10 dB, the accuracy of the proposed MSCNN-LSTMNet is 27.353%, 22.369%, 17.215%, and 8.905% higher than that of RF, DNN, CNN, and MSCNNNet, respectively.

Figure 12 shows comparisons of anti-noise performance on dataset III.We could see that RF obtained 74.219%,78.516%, 81.250%, 80.469%, 80.859%, 84.766%, 84.375%,83.203%, 79.297%, 82.422%, and 85.156% average accuracy for the 11 SNRs, respectively.And DNN achieved 87.843%,83.529%, 86.667%, 89.412%, 88.235%, 89.020%, 89.804%,89.020%, 91.373%, 90.980%, and 90.588% average accuracy,respectively.For CNN, it obtained 89.125%, 89.974%,86.250%, 88.793%, 88.862%, 88.145%, 85.994%, 88.402%,89.512%,89.320%,and 88.791%average accuracy,respectively.And CNN and DNN obtained better accuracy on dataset III than on datasets I and II.According to Figure 12,MSCNNNet obtained 94.332%, 92.961%, 95.832%, 95.182%, 94.662%,95.182%,93.810%,95.058%,94.401%,94.853%,and 94.274%average accuracy,respectively.Figure 12 demonstrates that the proposed MSCNN-LSTMNet achieved the highest diagnostic accuracy under different noise levels.It obtained 99.610%,99.543%, 99.021%, 99.154%, 99.354%, 99.677%, 99.411%,99.355%, 99.153%, 99.156%, and 99.411% average accuracy for the 11 SNRs, respectively.When SNR equals to -10 dB,accuracy of MSCNN-LSTMNet is 25.391%, 11.767%,10.485%, and 5.278% higher than that those of RF, DNN,CNN,and MSCNNNet,respectively.Consequently,it is shown that the proposed MSCNN-LSTMNet achieved much better anti-noise performance and generalisation ability than the existing methods.

F I G U R E 1 1 Comparisons of anti-noise performance on dataset II.

F I G U R E 1 2 Comparisons of anti-noise performance on dataset III.

For the hybrid cylinder misfire faults case,comparisons of anti-noise performance are presented in Figure 13.It shows that RF obtained 83.594%, 85.156%, 83.203%, 86.719%,85.938%, 85.156%, 88.672%, 85.938%, 85.547%, 89.062%,and 86.719% average accuracy for the 11 SNRs, respectively.And DNN obtained 86.804%, 86.412%, 88.373%, 89.157%,89.157%, 91.118%, 89.549%, 87.980%, 89.157%, 90.725%,and 90.725% average accuracy for the 11 SNRs, respectively.We could also see that CNN obtained 76.394%, 77.377%,78.840%, 79.865%, 79.186%, 78.204%, 79.180%, 80.501%,79.812%,78.796%,and 77.320%average accuracy,respectively.Compared with the first three datasets, the accuracy of CNN decreased to some extent.For MSCNNNet, it obtained 85.433%, 85.731%, 86.178%, 87.939%, 88.374%, 87.198%,85.924%, 86.410%, 89.390%, 87.884%, and 88.225% average accuracy for the 11 SNRs, respectively.And the proposed MSCNN-LSTMNet achieved the highest diagnostic accuracy under different SNRs.It obtained 94.917%, 95.010%,95.408%, 94.825%, 95.963%, 95.897%, 95.741%, 95.218%,95.820%,95.305%,and 95.161%average accuracy,respectively.Moreover, when SNR equals to -10 dB, the accuracy of MSCNN-LSTMNet is 11.563%, 8.113%, 18.523%, and 9.484% higher than those of RF, DNN, CNN, and MSCNNNet.Results on datasets I,II,III,and IV demonstrate verified anti-noise performance and accuracy of MSCNNLSTMNet for both one- and two-cylinder diesel engine misfires diagnosis.

3.2 | Comparisons of generalisation performance between different noise levels

In this section, we will investigate and compare the generalisation performance between different noise levels for different methods.To achieve this goal,we design 11 tasks for datasets I,II, III, and IV as shown in Table 4.Specifically, the dataset of one noise level is used as the training set, and the datasets of the remaining 10 noise levels are used as the test set.

T A B L E 4 The 11 tasks of generalisation performance between different noise levels for datasets I, II, III, and IV

F I G U R E 1 3 Comparisons of anti-noise performance on dataset IV.

T A B L E 5 Comparisons of generalisation performance between different noise levels on dataset I

TA B L E 5 (Continued)

The diagnosis results for the 11 tasks on dataset I are presented in Table 5.We could see that RF and DNN obtained the poor generalisation performance between different noise levels.For instance,RF obtained 72.549%,63.137%,59.608%,56.078%, 58.824%, 54.510%, 53.725%, 51.373%, 52.941%,and 54.118% average accuracy for task N10, and DNN obtained 68.235%, 67.059%, 67.843%, 65.490%, 63.529%,60.000%, 60.392%, 60.392%, 59.608%, and 59.608% average accuracy for task N10.Figure 14 shows intuitive comparisons for the tasks N10,N6,and N2 on dataset I.It is seen that the greater the difference of noise level, the lower the diagnosis accuracy, which is due to the greater difference between training set and test set.For other tasks,the accuracy of RF and DNN shows a similar trend.For CNN, it obtained 81.835%,80.859%, 80.273%, 80.468%, 81.054%, 80.664%, 80.273%,80.468%, 80.468%, and 80.664% average accuracy for task N10 and similar results for other tasks.And MSCNNNet obtained the better generalisation performance than RF,DNN,and CNN.For instance, it obtained 88.867%, 89.062%,88.476%, 88.281%, 88.085%, 88.867%, 88.476%, 88.281%,88.281%, and 88.476% average accuracy for task N10.According to Table 5,the proposed MSCNN-LSTMNet achieved much better generalisation between different noise levels than RF, DNN, CNN, and MSCNNNet.MSCNN-LSTMNet obtained 97.851%, 97.851%, 97.460%, 97.851%, 97.851%,97.070%, 97.460%, 97.460%, 97.460%, and 97.460% average accuracy for task N10 and similar high-precision diagnostic results for the remaining 10 tasks.

The diagnosis results for the 11 tasks on dataset II are presented in Table 6.For task N10 on this dataset,RF obtained 72.549%, 71.373%, 67.059%, 67.451%, 67.843%, 66.275%,68.235%, 67.059%, 66.275%, and 67.451% average accuracy,and DNN obtained 56.078%, 49.020%, 41.176%, 40.000%,38.431%,38.824%,39.608%,39.608%,39.216%,and 39.216%average accuracy, and CNN obtained 81.036%, 81.524%,81.720%, 81.231%, 81.231%, 81.622%, 81.231%, 80.938%,81.133%,and 80.938%average accuracy.Meanwhile,we could see from Table 6 that MSCNNNet obtained the better generalisation performance between different noise levels on dataset II than RF, DNN, and CNN, which verified the effectiveness of the proposed Residual-CNN denoising module and the multi-scale CNN module.Based on Table 6,MSCNNNet obtained 90.420%, 90.127%, 90.127%, 90.224%,90.029%, 90.322%, 90.127%, 90.029%, 90.127%, 89.833%average accuracy for task N10.Moreover, Table 6 demonstrates that the proposed method achieved the much better generalisation performance between different noise levels on dataset II than other methods.In all tasks, the proposed MSCNN-LSTMNet achieves at least 97.849% diagnostic accuracy.

Table 7 shows diagnosis results for the 11 tasks on dataset III.For this case,generalisation performance between different noise levels for RF, DNN, and CNN has been improved.For task N10,RF obtained 86.275%,85.098%,80.000%,77.255%,75.686%,78.824%,76.863%,78.039%,78.431%,and 76.471%average accuracy, DNN obtained 80.392%,73.725%, 67.843%,63.137%, 62.745%, 60.784%, 59.608%, 58.431%, 58.824%,and 58.824% average accuracy, and CNN obtained 89.381%,89.055%, 89.315%, 89.120%, 89.185%, 88.990%, 89.185%,88.990%, 89.055%, and 88.925% average accuracy.It is seen from Table 7 that MSCNNNet obtained 94.397%, 93.941%,93.680%, 93.485%, 92.899%, 92.899%, 92.833%, 92.638%,92.573%, and 92.899% average accuracy for task N10 and similar diagnostic results for the other tasks.Hence, its generalisation performance between different noise levels is better than those of RF, DNN, and CNN.Based on Table 7,the proposed MSCNN-LSTMNet achieved the best generalisation performance between different noise levels compared with RF, DNN, CNN, and MSCNNNet.For all tasks on dataset III, we could see that the proposed MSCNNLSTMNet achieves at least 98.892% diagnostic accuracy.

F I G U R E 1 4 Diagnosis results for the tasks N10, N6, and N2 on dataset I.

Finally, we analyse and compare generalisation performance between different noise levels for hybrid cylinder misfire faults case, and the diagnosis results on dataset IV are provided in Table 8.For task N10, RF obtained 77.647%,70.588%, 69.020%, 67.059%, 67.451%, 67.843%, 68.627%,65.882%, 67.843%, and 67.059% average accuracy, DNN obtained 63.529%, 45.490%, 36.078%, 32.157%, 32.157%,30.588%, 29.412%, 29.412%, 28.627%, and 29.020% average accuracy, and CNN obtained 76.197%, 75.855%, 75.610%,75.317%, 75.513%, 75.268%, 74.975%, 74.975%, 74.975%,and 74.682% average accuracy.For other tasks, they achieved similar diagnostic results.Hence, generalisation performance between different noise levels for RF, DNN, and CNN is not high for hybrid cylinder misfire faults cases.According to Table 8, MSCNNNet obtained the better generalisation performance than RF,DNN,and CNN.Specifically,MSCNNNet obtained 84.946%, 84.457%, 83.724%, 83.284%, 82.600%,82.258%, 82.160%, 81.622%, 81.720%, and 81.720% average accuracy for task N10.For the proposed MSCNN-LSTMNet,it also achieved the best generalisation performance between different noise levels for hybrid cylinder misfire faults cases.MSCNN-LSTMNet obtained 94.521%, 94.717%, 95.010%,94.864%, 95.059%, 94.815%, 94.961%, 94.913%, 95.059%,and 95.010% average accuracy for task N10, which are much higher than those of the other methods.Moreover,it obtained similar high-precision diagnostic results for the remaining 10 tasks.Therefore, the results verify the high generalisation performance between different noise levels for the proposed MSCNN-LSTMNet.

T A B L E 6 Comparisons of generalisation performance between different noise levels on dataset II

TA B L E 6 (Continued)

T A B L E 7 Comparisons of generalisation performance between different noise levels on dataset III

TA B L E 7 (Continued)

4 | CONCLUSIONS

In this article,we propose a novel multi-scale CNN-LSTM NN named MSCNN-LSTMNet with residual-CNN pre-processing for anti-noise diesel engine misfire diagnosis.To process the original vibration signal measured from the diesel engine cylinder,it designs a residual pre-processing block based on the noise superposition principle.Simultaneously,it utilises residual loss to construct a new loss function for model training.Then,a novel multi-scale CNN block is designed for extracting essentialfeatures of measured vibration signals at different scales.Specifically,multiple convolution layers with different branches and different convolution kernel sizes are utilised to extract different scales features for enhancing the robustness of the network model to the learning of misfire fault features.Moreover, the LSTM is utilised to further extract sequential features for improving anti-noise and generalisation performances.The antinoise and generalisation performances of MSCNN-LSTMNet are validated by comprehensive comparisons with existing methods under different noise environment and operating conditions.The results demonstrate that MSCNN-LSTMNet achieved much better anti-noise and generalisation performances than the existing methods.Under strong noise conditions(-10 dB signal-to-noise ratio)for four datasets,MSCNNLSTMNet obtained 97.561% average accuracy, which is 23.733%, 18.017%, 15.314%, and 7.82% higher that of RF,DNN,CNN,and MSCNNNet,respectively.In addition,for 11 noise generalisation tasks between different noise levels,MSCNN-LSTMNet obtained at least 96.679%, 97.849%,98.892%,and 94.010%accuracy on the four datasets,which are much higher than those of existing methods.

T A B L E 8 Comparisons of generalisation performance between different noise levels on dataset IV

TA B L E 8 (Continued)

ACKNOWLEDGEMENTS

This project was supported by the National Key R&D Program of China (Grant No.2020YFB1709604) and Shanghai Municipal Science and Technology Major Project (Grant No.2021SHZDZX0102).

CONFLICT OF INTEREST

The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

DATA AVAILABILITY STATEMENT

Data are available on request from the authors.

ORCID

Chengjin Qinhttps://orcid.org/0000-0002-5200-3241