Deep learning approach to detect seizure using reconstructed phase space images

2020-09-21IlakiyaselvanNayeemullaKhanShahina

THE JOURNAL OF BIOMEDICAL RESEARCH 2020年3期

N. Ilakiyaselvan, A. Nayeemulla Khan, A. Shahina

1School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, Tamilnadu 600127, India;2Department of Information Technology, SSN College of Engineering, Kalavakkam, Tamilnadu 603110, India.

Abstract Epilepsy is a chronic neurological disorder that affects the function of the brain in people of all ages. It manifests in the electroencephalogram (EEG) signal which records the electrical activity of the brain. Various image processing, signal processing, and machine-learning based techniques are employed to analyze epilepsy,using spatial and temporal features. The nervous system that generates the EEG signal is considered nonlinear and the EEG signals exhibit chaotic behavior. In order to capture these nonlinear dynamics, we use reconstructed phase space (RPS) representation of the signal. Earlier studies have primarily addressed seizure detection as a binary classification (normal vs. ictal) problem and rarely as a ternary class (normal vs. interictal vs. ictal)problem. We employ transfer learning on a pre-trained deep neural network model and retrain it using RPS images of the EEG signal. The classification accuracy of the model for the binary classes is (98.5±1.5)% and(95±2)% for the ternary classes. The performance of the convolution neural network (CNN) model is better than the other existing statistical approach for all performance indicators such as accuracy, sensitivity, and specificity.The result of the proposed approach shows the prospect of employing RPS images with CNN for predicting epileptic seizures.

Keywords: epilepsy, reconstructed phase space, convolution neural network, reconstructed phase space image,AlexNet, seizure

Introduction

Epilepsy is a common neurological condition that causes recurrent and unprovoked seizures. Epilepsy is a central nervous system disease that causes abnormal behavior and sometimes even loss of awareness in a patient. About seventy million people in the world are affected by epilepsy. The epileptic seizure may be related to brain damage or hereditary, in which the cause is often completely unknown. Electroencephalogram(EEG) signals which monitor brain activity are generally analyzed by neurologists and specialists to detect and categorize various types of disease and to identify regions indicative of pre-ictal spikes and seizures. The presence of numerous spikes in the EEG signals is an indication of epileptic seizure activity in the brain.

Normally in clinical environments, diagnosis of seizure in patients involves continuous monitoring using video and EEG signals recorded over long periods. Human experts are then required to manually review the data based on the visual inspection to arrive at a clinical interpretation. This is timeconsuming and there is a lack of sufficient expertise.Hence automation of seizure detection is essential.Automation systems require features extracted from the signal. Several techniques exist for extracting the feature either in the time, frequency, or timefrequency domains. Due to the nonlinear and nonstationary nature of the EEG signals, features based on the time-frequency domain are used for detecting epileptic signals[1–3]. Empirical mode decomposition technique and Fourier-Bessel expansion are used for computing the mean frequency of intrinsic mode functions (IMFs) to discriminate ictal from interictal EEG signals[4]. Recurrence quantification analysis(RQA)[5], wavelet transform and multi-wavelet transform are used for classification of the EEG signal into three classes such as normal, interictal and ictal in[6]. Alternatively, a pattern recognition approach that recognizes the recorded EEG signals on cognitive conditions focusing more on improving classification accuracy is proposed[7]. Numerous machine learning algorithms likek-nearest neighbor (k-NN), naive Bayes (NB), random forest, artificial neural networks(ANN), support vector machine (SVM), decision trees, least square-support vector machine (LS-SVM),general regression neural network (GRNN) and mixture of expert model have been proposed to classify the abnormality from the EEG data.

Based on information from magnetoencephalogram,electromyogram, electrooculogram, electrocardiogram and EEG, nonlinear dynamic techniques are effectively used in biomedical applications[2–8]. This study focuses on modeling the nonlinear dynamics of the brain. As universally accepted, the brain is regarded as a chaotic dynamic system, and it produces EEG signals that are usually chaotic[9]. In another sense, an EEG signal is chaotic, as its amplitude changes randomly over time. These chaotic signals are characterized by long-term unpredictability, which makes classical signal processing techniques less helpful. Modeling the dynamics is a challenge when using conventional features/models. We use reconstructed phase space (RPS) techniques developed for chaotic signal analysis on the epilepsy dataset from the University of Bonn (UoB) and show improved classification accuracy for 22 different class combinations.

The use of RPS trajectory images as input to a convolution neural network helps to model the dynamics. Further, the deep neural network results in an end-to-end system, eliminating the need to handcraft features for modeling. The end-to-end system proposed performs at par or better than other state-of-the-art systems reported in literature. A pretrained convolutional neural network of the AlexNet architecture is retrained with RPS images extracted from the dataset to classify the data into different classes. Representative RPS images for each of the five classes of UoB dataset are shown inFig. 1.

Due to the tedious nature of observing EEG signals in clinical settings, researchers have employed machine learning approaches for automating the detection of seizure classification of EEG signals with promising results. In 1979, Gotmanet al[10]is one of the pioneers who helped open up the research in seizure detection, used sharp and spike waves for an automatic recognition system that used prolonged EEG recordings to detect interictal epileptic activity.Later on, he focused on using functional magnetic resonance imaging to examine automatic seizure detection with high-frequency activities in the wavelet domain[11].

Fig. 1 Representative RPS images of set A, B, C, D, and E of UoB dataset.

Using UoB dataset, researchers have examined several techniques for the automatic detection of epilepsy[12]. SVMs have commonly been used as a classifier to distinguish seizurevs.non-seizure[13]using features based on discrete wavelet transform(DWT)[14], tunable Q-wavelet transform[15]and recurrence quantification analysis[16]with an accuracy of 96.3%, 98.6%, and 94.4%, respectively. Shoeb[17]used SVM for patient-specific prediction which resulted in 96% accuracy. Gandhiet al[18]utilized a probabilistic neural network (PNN) in combination with SVM which resulted in an accuracy of 95.44%for classifying class combination ABCD-E. Sharmilaet al[7]studied fourteen different combinations of classes using statistical features extracted from DWT coefficients and applied naive Bayes andk-NN classifiers. A GRNN was employed for the classification of ictal and non-ictal states in[19]. Both the studies[7,19]achieved maximum accuracy of 100%for A-E (normalvs.seizure) cases. In a study by Nicolaouet al[20], where entropy-based features were employed an accuracy of 93.55% was attained for A-E cases and 86.1% for ABCD-E.

A probabilistic approach to modeling the distribution of the classes using Gaussian mixture model (GMM) by Chuaet al[21]resulted in an accuracy of 93.11% for three classes (normal,interictal and ictal) when using higher order spectra(HOS) features[22]and 93.1% classification accuracy when using power spectral density features[23].

Deep learning approaches in machine learning are currently outperforming the state-of-art performance of conventional machine learning algorithms in numerous domains. Employing deep learning methods, Ishan Ullahet al[24]used pyramidal onedimensional convolution neural network (P-1D-CNN)and achieved the maximum accuracy of 100% for A-E class combination. In the P-1D-CNN, novel data augmentation schemes and an effective deep CNN model were used for the classification of UoB dataset.Rajendra Acharyaet al[25]reported accuracy of 88.67% using a CNN with thirteen deep convolution layers.Table 1reviews selected studies on EEG classification using the UoB dataset with the features,classifiers used and their accuracies.

The PSR signal provides a visualization of the signal's dynamic behavior over time which is useful to guide model specification. In some studies, two dimensional (2D) and three-dimensional (3D) PSRs of the IMFs are used for the classification of EEG signals[30]. Using dataset of Graz University of Technology, the PSR technique has been used for the three-class combination of motor imagery classification[35]. Based on 2D RPS plot, from Physionet CHB MIT database, central tendency measure (CTM) was used to compute the region of 2D RPS plots to differentiate between seizure and seizurefree EEG signals[36].

The rest of the paper describes in detail the process involved and discusses the basic details about RPS,convolution neural network, its layers and transfer learning. Next, it describes the Bonn Dataset, the design of the proposed system to classify EEG signals based on RPS images, and finally deals with the experimental results and performance of the system.

Materials and methods

Reconstructed phase space (RPS)

RPS is a method used to construct the phase space of a dynamical system which is a space where various possible states of a system are represented. At any given time, the state of the system is represented. The state of the system changes with time. Thus the dynamics of the system are captured by the vector describing the state change over time in the phase space. The shape of the trajectories gives insights about the periodic/chaotic nature of the system. From Takens embedding theorem[37], the phase space can be reconstructed using a single observation and employing a small time delay embedding. For a time seriesS[ti],i=1, 2,...N, the phase space can be denoted at timetias:

wherei=1, 2, …N–(m–1)τ. Here time delay is denoted byτandmis the phase space dimension.

Time delay embedding is the most frequently used method of RPS. It is essential to determine the optimal values of time delayτand the embedding dimensionm. The choice of both parametersτandmare key factors in nonlinear analysis. Using average mutual information we can select a suitable time delay and by applying false nearest neighbor (FNN) method one can identify the dimension of RPS. In this study we use a time delay ofτ=6 and embedding dimension ofm=2 explained under proposed model, to construct the RPS portraits for the given EEG time series data.

Convolution neural network (CNN)

CNN is a deep neural network architecture conventionally employed to analyze images. CNN is composed of multiple convolution layers and each layer generates a feature map to preserve unique information about the input images[38]. Convolution operations are applied by the convolution layer to the input image and transfer the result to the succeeding layer. Next, the pooling layer is used in the middle of the convolution layer for reducing the dimensionality of the feature map while preserving the relevant features. It limits the number of trainable parameters by reducing the size. The max pool operation selectsthe most dominant features in the map. Fully connected (FC) layers are typically applied after the convolution layers for classification purposes. The last fully connected layer is a softmax layer used to normalize the output and predict the class of the input image. In this study, we use the AlexNet model and experiment with different class combinations,achieving the best results using RPS portraits.

Table 1 Seizure classification studies on the UoB dataset

Transfer learning

The problem of classifying the signal as being from an epileptic source is mapped to classifying RPS images of epileptic/non-epileptic subjects using deep learning. Training deep learning models requires a large dataset of images and numerous iterations for convergence[39]. As acquiring a large dataset is challenging we adopt transfer learning for the task[40].Transfer learning aims to use the knowledge learned from the source domain to the target domains. It enables us to create accurate models without training an entire network from scratch even with an insufficient dataset. In place of learning from scratch,we begin with patterns learned when solving a different problem. Thus we gain knowledge (features and weights) from previously learned models to train new models and avoid proceeding from scratch[41].Pre-trained CNN models like LeNet, AlexNet,VGGNet, GoogLeNet, ResNet,etc., can be used for the task.

Choosing a pre-trained model in transfer learning indicates the specific categorization of the domain and its essential features are shaped by (i) the input quantity of the images and (ii) the similitude of the model's domain to the task employed, for it has a bearing on the performance of the system. Epileptic/non-epileptic is the domain specific class label.

Dataset description

The University of Bonn, Department of Epileptology, Germany, provides an open-source epileptic EEG dataset[42]. Bonn database is a widely used benchmarking dataset for validating seizure detection models. The dataset uses the 10-20 international electrode placement system for acquiring the data. This experimental dataset includes five sets of EEG data A, B, C, D, and E as shown inTable 2.Sets A and B contain normal EEG signals recorded from five healthy subjects. The remaining sets of C,D, and E were recorded from epilepsy patients. The sets A and B of normal subjects were relaxed in an awaken state and represent EEG recordings with their eyes in open and closed states respectively. Before an epileptic attack, set C was recorded opposite to the epileptogenic zone, while set D was recorded in the epileptogenic zone. These two sets C, D represent the interictal state. Set E was recorded during an occurrence of epileptic seizure (ictal) signal in an epileptogenic zone.

Each set of data recorded with a 128-channel amplifier system comprises 100 files corresponding to single-channel EEG segments and the duration of each sample recording is 23.6 seconds with a sampling rate of 173.61 Hz. A band-pass filter with a passband of 0.53 Hz–40 Hz (12 dB/oct) was used to select the EEG signal of the required band. As a result of visual inspection of artifacts (e.g., owing to a pathological activity or eye movement), these artifacts were removed from the continuous multi-channel EEG recordings. From all recording locations with ictal activity, the EEG segments were selected for set E[12].

Thus each recording data contains 4 097 samples which have been split into segments of 510 samples each to generate many instances from one record.These segments form the basis for all further processing.

Proposed model

The proposed system uses RPS images that are extracted from segments of the EEG signal in the UoB dataset. The RPS image is used to test a CNN of the AlexNet[43]architecture, so as to classify the RPS image into epileptic and non-epileptic classes. The functional flow diagram is shown inFig. 2.

RPS image dataset

The observational dataset includes five sets of EEG data A, B, C, D, and E. These sets pertain to normal,seizure-free (interictal) and epileptic seizure (ictal)signals. The choice of segmenting each recording into 510 samples is because RPS portraits are too dense for larger samples and CNN cannot capture the intrinsic feature the phase space depicts. Therefore, by trial and error, 510 samples are found to be the most suitable segment size to represent RPS portraits as well as to generate sufficient images to retrain the CNN model.Fig. 3shows the process flow diagram for the dataset creation.

Embedding dimension 'm' and time delay 'τ' are the critical parameters of the RPS portraits[44]. FNN is chosen for finding embedding dimension, while mutual information is used for finding the time delay[45]. The embedding dimension is based on the FNN percentage where it effectively drops to zero as depicted inFig. 4Aand the time delay is based on mutual information at which the first minimum occurs as depicted inFig. 4B. After running multipleexperiments we arrived at an appropriate delay ofτ=6 as seen fromFig. 4B. LikewiseFig. 4A, it can be seen that the ideal embedding dimension would bem=3,but since we are working on two dimensional (2D)CNN we restrict our dimension tom=2. Every segment is transformed into a RPS image and these images are extracted for the non-overlapping segments with the dimensionm=2 and time delayτ=6.

Table 2 Description of the EEG database of UoB[12]

Fig. 2 Functional flow diagram of the proposed system.

Fig. 3 RPS image acquisition.

Pre-trained model: AlexNet

In this study, a pre-trained AlexNet model is used as it shows better performance in classifying epileptic seizure sets as compared to another state of the art of CNN models namely LeNet and GoogLeNet as shown inTable 3. AlexNet changed all the records of preexisting non-deep learning-based techniques. AlexNet contains five convolution (Conv) layers and three fully connected (FC) layers. Each convolution layer consists of 96 to 384 filters and the size of the filters ranges from 3×3 to 11×11 with the feature map of 3 to 256 channels each. In each layer, a non-linear ReLU(Rectified Linear Unit) activation function is used.ReLU is an important feature of AlexNet instead of the tanh or sigmoid activation function used to train a model for a neural network. The primary reasons for using ReLU in Convolution layers are faster convergence owing to the lack of vanishing gradient problem and inducing sparsity in the features. 3×3 Max pooling is applied to the outputs of layer 1, 2 and 5. In the first layer, a stride of 4 is used to reduce the computation.

Local Response Normalization (LRN) is used in the first and second layers before max pooling. When compared to LeNet, AlexNet applies larger weights and the shape varies from layer to layer[43]. To handle overfitting, dropout is used instead of regularization.The training time, however, is doubled by a 0.5 dropout rate. AlexNet model's tensor sizes (images)and number of parameters of convolution layers are shown inFig. 5.

Training AlexNet

For training the proposed model, 90% of epileptic and non-epileptic RPS images of the dataset (section 5.1) are utilized (75% training and 15% validation).10% of the data is reserved for testing. To optimize the performance stratified 10-fold cross validation(CV) is performed. The 10-fold CV splits the data at random into 10-disjoint sub-sets called folds. The stratified folds maintain the mean response value in all folds, which is approximately equal. Each fold holds the same proportions of the two types of class labels,namely epileptic (Set E) and non-epileptic (Sets A, B,C, and D) classes.

Training of AlexNet model requires the weight(Kernels) to be learned from the data. We use backpropagation with cross entropy as the loss function along with the stochastic gradient descent for optimization to learn these parameters. The model is trained for image classification using the Caffe framework. This model is trained for 200 epochs with learning rate (0.01), batch size (128), weight decay(0.0001), gamma (0.1), and momentum (0.9) as hyperparameters. The outcome is that the model's depth is significant for its high efficiency, which is computationally expensive but made possible using graphics processing units (GPUs). Several other complicated CNNs can perform very effectively on faster GPUs, even on large datasets. Using K80 GPU machine, this step takes around 45 minutes for one run of training for computation on UoB datasets. As the model complexity increases, the computational complexity during training and testing also increases.The retrained AlexNet model inFig. 6shows the visualization of convolution filters of the CNN model trained on RPS images of the UoB dataset.

Fig. 4 Identification of embedding dimension and time delay. A: FNN for determining the embedding dimension m. B: Determination of optimal time delay τ.

Table 3 Comparison of AlexNet with LeNet and GoogLeNet

Results

In this work, twenty-two different combinations of classes were considered for classifying the segments into being epileptic/non-epileptic conditions. To evaluate the performance of the proposed RPS based deep learning approach, the performance was evaluated based on standard metrics like classification accuracy, sensitivity, specificity, precision and F-score for all the binary and ternary classes as described below.

Fig. 5 Tensor sizes and parameters of convolution and FC layers of AlexNet model.

Fig. 6 Trained AlexNet with a representative feature map learned from RPS images.

Here, TP refers to the number of images that are actually epileptic and predicted as epileptic class, FP indicates actually non-epileptic class predicted as epileptic. TN refers to actually non-epileptic class predicted as non-epileptic class, while FN indicates actually epileptic and predicted as non-epileptic class.

Table 4shows the classification accuracy,sensitivity, and specificity for 15 different binary class combinations. The corresponding confusion matrices for a few binary classes are shown inTable 5.Likewise, the performance of the system for ternary class combinations is given inTable 6.

The corresponding confusion matrices for selected ternary class combinations are given inTable 7. The results indicate that the proposed system performs best for the case A-E with 100% accuracy. The system performed better than or at par with other results reported in the literature (Table 1).

The performance of our approach was compared with the state-of-the-art methods of earlier investigations for all twenty-two class combinations.Sensitivity scores of the proposed approach indicate that the percentage of correctly identified epileptic patients for all class combinations is always high and for nine of the classes 100% is achieved. For specificity, the proposed approach achieves 100% for case A-E and very high percentages for the rest of the class combinations.

Table 8reports the sensitivity and specificity for various class combinations in comparison with results in the literature. To the best of the authors' knowledge,the performance of our sensitivity is better than existing approaches. We see that the proposed approach of using RPS based CNN models perform better than existing approaches for all binary classes with an accuracy of (98.5±1.5)%. For ternary class combinations, the proposed system has an accuracy of(95±2)% except for the C-D-E case where the accuracy is 84.44%.

Table 4 Performance measures of binary class combination

Table 5 Confusion matrices under various test conditions: binary class combinations

Table 6 Performance measures of ternary class combination (%)

Discussion

Chaotic dynamical systems like EEG may evolve asymptotically towards lower dimensional attractors.These attractors can be indirectly visualized using the phase space reconstruction. We see fromFig. 1, more so for the ictal case (class E), that such lower dimensional attractors exist for EEG signals. This was reflected in earlier studies too[30,35–36], where the RPS was used as a basis to manually extract features and then classify.

To reconstruct phase space and its chaotic features,the choice of time delay and embedding dimension are the critical factors[46]. If the range ofτis too low, the attractor will be compressed into the area of the diagonal line of the coordinate system; on the other hand, if the range ofτis too high, the data point trajectory will wrinkle and fold, making it hard to achieve clear projection relationships. This study to the best of our understanding is the first approach to use the RPS images as the basis to develop an end-toend CNN based system that performs at par or better than other approaches in the literature. It can automatically extract features and detect the ictal case with close to 100% accuracy.

The sensitivity of our model is 100% in most cases.More significantly it is 99.43% for classes AB-CD as seen inTable 8. This signifies the model can clearly distinguish between a normal signal and a signal from seizure-free intervals. Considering that only 510 samples are required to make an accurateclassification. This approach may be feasible for use in real time. The limitations of this study are it has been used on the UoB dataset which is a clean dataset.The approach needs to be validated on a more challenging dataset like Kaggle: Melbourne University AES/MathWorks/NIH Seizure Prediction or in the live environment.

Table 7 Confusion matrices under various test conditions: ternary class combinations

Table 8 Comparison of performance measures: sensitivity and specificity of this work with those of earlier reported studies

In conclusion, the brain is a nonlinear dynamic system and the EEG signal is modeled as a chaotic system using RPS portraits. The novelty of this study is the use of these RPS images as input to the CNN model. We further employ transfer learning on the standard AlexNet model to learn the RPS images for the seizure detection problem. We report an accuracy of (98.5±1.5)% (epilepticvs.non-epileptic) and(95±2)% (normalvs.interictalvs.ictal) which is higher than most existing results in the literature. We thus see that the proposed approach can model the dynamic nature of the EEG signal. The RPS images are seen to be valid input representations, and the CNN model can accurately capture the features in the RPS image and perform with high accuracy for 22 different class combinations. The outcome of the proposed system using RPS images could be used in supporting physicians to detect epilepsy and may serve as a precursor to assist the neurologist in classifying the epileptic states of the patients.

THE JOURNAL OF BIOMEDICAL RESEARCH

2020年3期