A Novel Deep Learning Method for Application Identification in Wireless Network

2018-10-13JieRenZulinWang

China Communications 2018年10期

Jie Ren, Zulin Wang,2,＊

1 School of Electronic and Information Engineering, Beihang University, Beijing 100191, China

2 Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China

Abstract: In modern wireless communication network, the increased consumer demands for multi-type applications and high quality services have become a prominent trend, and put considerable pressure on the wireless network.In that case, the Quality of Experience (QoE)has received much attention and has become a key performance measurement for the application and service. In order to meet the users’expectations, the management of the resource is crucial in wireless network, especially the QoE based resource allocation. One of the effective way for resource allocation management is accurate application identification. In this paper, we propose a novel deep learning based method for application identification.We first analyse the requirement of managing QoE for wireless communication, and review the limitation of the traditional identification methods. After that, a deep learning based method is proposed for automatically extracting the features and identifying the type of application. The proposed method is evaluated by using the practical wireless traffic data, and the experiments verify the effectiveness of our method.

Keywords: quality of experience; application identification; protocol identification; deep learning; feature extraction

I. INTRODUCTION

Modern wireless communication technologies greatly promote the development of wireless network. With the popularization of wireless communication network, the consumer demands increase rapidly, and mainly reflected in the diversity of applications and the quality of services. Usually, the user perception for application and service is known as Quality of Experience (QoE) [1], which reflects the degree of fulfillment of the user expectations with respect to the application or service [2]. The unceasing development of wireless technologies leads to the booming of user demands, and mainly manifested in the aspects as: the diversity of applications, the quality of voice, the quality of video [3], the fluency of video [4] and so on [5]. In order to satisfy these user demands, the network has to allocate resources for user, such as spectrum,bandwidth, timeslot and power, according to the user demands and wireless environment conditions. Unfortunately, the increasing number of devices in wireless network leads to the shortage of resource. In that case, the allocated resources for users have to be differentiated by different required QoE.

Meeting application requirements for all ac-cessed users calls for effective QoE management mechanisms [6]. The key technology for solving the problem is application identification [7]. This technology commonly depends on protocol parsing to determine the protocol type that application used in the wireless network. Since the protocol provides informative characteristics for the carried application data,it is clear that the different types of protocols represent different kinds of applications. Thus,the protocol identification is important for the QoE based network management.

Protocol identification is one of the most effective method for application identification,which is important for QoE based network management.

Traditionally, the way to identify the protocol type is mostly rely on the prior information of the wireless communication system, such as port based and deep packet inspection (DPI)based method [8]. Port based method is the simplest way for protocol identification, when the protocols use the fixed port number. These ports are all registered in Internet Assigned Numbers Authority (IANA) [9], such as port 80 for HTTP, port 25 for SMTP, port 20 for FTP and port 110 for Pop3. When packets pass through the network based on DPI technology, the DPI method can be used for protocol identification [10]. These methods are easy to implement and have achieved a promising result. However, those methods rely on a lot of constraints, which are impractical in the modern wireless communication environment. The reasons are summarized as follows.

• Protocols do not use the static port number any more, and some protocols may not be registered in IANA.

• The encryption technologies have been widely used in modern wireless communication systems, that leads to the prior information deficiency.

In a few decades, artificial intelligence (AI)technologies have developed rapidly. As one of the sub-categories of AI, machine learning has been widely used for computer vision[11] and video coding [12]. The most useful application of machine learning is classification, and has been introduced in the task of protocol identification [13]. Meeting the above challenges, the way to do that is by, first of all,extracting features from the original protocol samples and, secondly, applying some of the most amazing frontiers of machine learning for classification. Generally, the feature extractor is usually designed by researchers with engineering skill and domain expertise of the system. The signal-level and bit-level characteristics taken from the traffic flows are used to construct the feature space [14]. After that, the machine learning based (ML-based)methods are applied to map these features into different protocol types. In the past decades,a large number of machine learning methods have been applied for protocol identification,such as neural network (NN) [15], hidden markov model (HMM) [16] and support vector machine (SVM) [17]. The experimental results show the effectiveness when using the ML-based identification methods.

However, the existing ML-based protocol identification methods usually rely on the hand design feature extractors, which may leads to inefficiency in analyzing features when lack of prior information. Besides, these methods can be only used in some certain systems, not suitable for other systems or unknown systems. Thus, the traditional hand design feature extractor is always restricted for its resource consumption and feasibility limitation [18].

In recent years, the development of deep learning has promoted the innovations in many fields, such as image processing [19]. As a representation-learning method with multiple levels of representations, deep learning is suitable for machine to automatically discover the representations from the image, taking advantage of its structured features [20]. Since deep learning requires little engineering by hand, it can be used for automatic feature extraction,using a general purpose learning procedure.

Sparked by the advantages of deep learning, we aim to provide a novel deep learning based protocol identification method, with the purpose of improving the performance of QoE-based network management. Specifically,we introduce the deep learning based method for extracting the features of the protocols,without the prior information of the system.The main contributions of this paper are sum-marized as follows:

1) We propose a novel convolutional neural network (CNN) based feature extractor to extract the features from the traffic flow automatically.

2) We provide a support vector machine based identifier to map the protocols into different types.

3) We use real-world data of the most common civil wireless communication system, Wifisystem, to evaluate the proposed method.The results suggest that the proposed method outperforms the conventional protocol identification methods.

The remainder of this paper is organized as follows. Section II describes the scheme of our method, which mainly consists of feature extraction and protocol identification; Section III shows the experimental results; Finally,Section IV concludes the work.

II. PROTOCOL IDENTIFICATION MODEL

The structure of our model is divided into two parts: feature extraction and protocol classification. In our model, a deep learning based method is developed for automatic feature extraction, and a classification-based method is used for identification. The framework of our model is shown in Figure 1.

In order to demonstrate our method, the used notations and their explanations are shown in Table 1.

2.1 Feature extraction

In wireless communication network, the way for application identification is to identify and determine the protocol type which used by the application data. Among that, the quality of selected features largely determines the result of protocol identification. Thus, we first introduce our proposed feature extractor.

Considering the construction of data in wireless communication system, the protocol is firstly used to form the wireless radio frame in some certain format, and then the frame is used to generate the transmission signal.Similar to the hierarchical structure of image,in which high-level features are the representations of low-level features, the protocol can be also regarded as composing of some structural features. For bit-level, the protocol is formed in the certain format, which consists of pilot, transport format combination indicator (TFCI), feedback information (FBI) and transmit power control (TPC), et al. For signal-level, the radio signal is generated by the radio interface technologies: coding, spreading and modulation. From this point of view, the protocol also can be regarded as a coalition of a series of structured patterns, in signal-level.

Overall, the protocol can be described as a series of structured features, both in signal-level and bit-level. Considering the advantages on the characterization of structured features,we introduce the deep learning algorithm, convolutional neural network, for the task of feature extraction. Here we mainly concern about bit-level features.

1) Structure: Convolutional neural network is a specialized kind of neural network for processing data which has a known topology,such as time-series data of 1D grid and imagedata of 2D grid [21]. It uses convolution in place of general matrix multiplication in at least one of its layers, and uses relatively little pre-processing compared to other classification methods. Because of the low dependence on prior knowledge and human effort, the algorithm is competent for the automatic feature extraction, that has received tremendous success in practical multimedia applications of image, video and even panoramic video [22].

Table I. Notation and explanation.

Fig. 1. The framework of the proposed method, which consists of feature extraction and protocol identification.

In this part, we develop an 1D-CNN based feature extractor. The CNN consists of an input layer, an output layer and multiple hidden layers. The structure of 1D-CNN is shown in Figure 2. In this network, the hidden layers are composed of multiple convolutional layers,pooling layers and fully connected layers. The convolutional layer applies the convolution operation to its input data by different kernels,and then passes the results to a pooling layer.Usually, the size of these kernels is smaller than the input, with the purpose of detecting small, meaningful features that occupy only during a few bit flow. Meanwhile, the kernel is convolved at every position of the input by parameter sharing. It means that we only need to save a few parameters, that reduces the memory requirement of the network. On the other hand, the convolutional kernel simplifies the operation when computing the output, that leads to the computational complexity reduction and efficiency improvement. Thus, we leverage the three important ideas of convolution: sparse interactions, parameter sharing and equivariant representations.

Fig. 2. The framework of the cnn-based feature extractor, where u is the input of each layer and a is the output of each layer. In this model, ʺConvʺ represents the convolution operation, ʺPoolʺ represents the pooling operation, and ʺFCʺ represents the fully connected operation.

After operating convolution in parallel and running through a nonlinear activation function, the output is import to a pooling layer.In pooling layer, the output is replaced with the summary statistic of the nearby units by some statistical methods, such as max pooling or average pooling [23]. Through the pooling layer, it helps to make the representation approximately invariant to small translations of the input and reduces the size of the output.

After stacking a few convolutional layers and pooling layers, we use the fully connected layers to construct the feature space vector for the identification.

2) Algorithm: Assume that there is a protocol sample (xi,yi)∈S , where xiis the data with length L, and yiis the type of protocol.For the bit-level feature extraction, we apply 1D-CNN to extract the features from the raw data sample.

Firstly, we move forward to get the features of the input data xi. At the convolutional layer, we have that

After the operation of convolution, we run its outputthrough a pooling layer, for the purpose of reducing the dimension of the data. Here, we apply max pooling method to select the maximum value from each of a cluster of pooling windows at the prior layer, that expressed as

Similarity, at the next convolutional layer,the previous pooling layer’s outputis convolved with the new kernels, and then go through the pooling layer. The output of the second convolutional layer is denoted as,n=1,2,…,K(3)(the number of feature maps in the second convolutional layer), and the pooling layer as. After that, we use fully connected layers to construct the feature space, denoted as, and then map the features to the corresponding type, denoted as

After feedforward, we get the predict result tiof sample xi. Then the squared-error function is applied to calculate the deviation between the real and predict output, given by

where yiis the one-hot true value of label yi.

When backpropagation to train the network,we compute the derivatives with respect to the weights and biases of each layer. So for the fully connected layer, there is

where δ represents the ”errors” when propagate backwards through the network.“◦” denotes element-wise multiplication between two tensors of same size. For the output layer, the ”errors” take a form as:, where g(·) is the softmax function.

For pooling layer, if the following layer is a fully connected layer, then the “errors” can be calculated as the way for fully connected layer. Otherwise, if the following is a convolutional layer, there is

where fdown(·) represents a sub-sampling function.

For convolutional layer, there is

where fup(·) is an up-sampling operation.is the ”errors” of the mth feature map in layer l, while k represents the element.is the bit patch inthat was multiplied element-wise byduring convolution in order to compute the element at k in the output feature map[19].

By forward and backward propagation,the model iterates to converge to the optimal solution. Then, the output of the last fully connected layer,, is treated as the features of the sample, and expressed as

2.2 Protocol classifier

After extracting the features of the original sample, we then map these features to the corresponding type of protocol. In this part, we apply the supervised learning method, support vector machine (SVM), to identify the protocol type.

SVM is a widely used supervised classification method to separate different kinds of data by searching one or set of hyperplanes in a high- or infinite-dimensional space [25]. Usually, data in N-dimensional feature space can be separated by (N−1) -dimensional hyperplane. After feature extraction, the sample data(xi,yi) can be represented as (h(i),y(i)). The hyperplane is expressed as (ω,d), where ω is the weight and d is the bias. In order to find the optimal solution, the model is processed by maximizing the margin, as well as solving the dual problem described as follows:

Table II. Dataset of our experiment.

where ξ(i)is the slack and parameter C determines the tradeoff between increasing margin-size and ensuring samples lie on the correct side. When training, we use the Radial Basis Function (RBF) to map the input to the separable eigenspace. The optimization for (7)is solved by lagrange algorithm as the following formulation,

where αiand γiare both lagrange multipliers. Then, we use the sequential minimal optimization (SMO) algorithm to train the model.

Therefore, combining these two parts, our protocol identification model is shown in algorithm 1.

III. EXPERIMENTAL RESULTS

For the task of protocol identification, we developed an 1D-CNN for feature extraction,and SVM for identification. To evaluate the performance of our method, we considered the real-world data of the most common wireless system, WiFi, and selected five types of protocols that used by applications. Also, we compared the results with the conventional protocol identification methods.

3.1 Datasets

A large and well curated dataset is the foundation for machine learning, especially for deep learning. Besides, the development of high level programing languages and the application of Graphics Processing Unit (GPU) improve the computational power and processing power dramatically. In our experiment, we collected and constructed the dataset of Wifisystem by using the Software Defined Radio(SDR) platform USRP, which consists of the USRP X310 motherboard and UBX160 daughterboard. The collected data samples for experiment are shown in table 2.

In order to balance the number of differ-ent types of protocols, we randomly selected 10000 samples from each type of the protocols for training dataset, and 3000 from each type for testing dataset.

3.2 Identification model

We constructed the protocol identification model based on the proposed method in Section II. The model consists of two parts, the feature extractor and the classifier. Firstly, we constructed a 4-layer CNN to extract features from the raw data samples. After that, we used these features as the input to the SVM-based classifier for identifying different types of protocols. The architecture of this protocol identification model is shown in Figure 3.

The first part is for feature extraction. The parameters of each layer are summarized in table 3.

In this part, we stacked four convolutional layers and pooling layers, two fully connected layers to extract the features from the raw protocol sample. The output of the last fully connected layer is the representation of the sample, and its dimension is smaller than the input. In the experiment, this model was trained with back-propagation algorithm,which introduced in section II.

In the second part, we used SVM as the classifier to identify five types of protocols.The input to this part was the feature vector of dimension 32 from the 1D-CNN based feature extractor. The output was a 1-vs-5 vector,which represented one of the five types. In the experiment, we trained the model by SMO algorithm introduced in section II.

3.3 Experimental results

In our experiment, the training dataset was composed of 10000 protocols from each type,where 20% of the protocols were used for validation. The testing dataset was composed of 3000 protocols from each type. The experiments in this paper were implemented on Tensorflow, while the training was accelerated by GPU (GTX1080Ti).

1) Feature extraction: The first part of our identification model was a trainable 1D-CNN that accepted the batched and shuffled raw data samples as input, along with their respective classification labels. It was trained using backpropagation, in addition to a set of regularizers, namely learning rate decay, dropout and batch normalization. The model hyper-parameters are provided in Table 4.

The proper feature dimension is important for identification. Insufficient features can not represent the raw protocol, that leads to inaccuracy for protocol identification. Oppositely,too many features increase the number of parameters and lead to the computation complexity, without improving the identification accuracy. Thus, we first learn the dimension of features. Here, we set the dimension of the last fully connected layer range from 20 to 80.Then we compared the number of parameters,runtime, loss, and accuracy of different dimensions. The results are shown in Figure 4.

Fig. 3. The Structure of the protocol identification model.

Table III. Parameters of feature extractor.

Table IV. The feature extractor model hyper-parameters.

Fig. 4. The performance of different feature dimensions.

Figure 4(a) shows the number of parameters in fully connected layer of different dimensions. It arises along with the increased feature dimension, thus leading to more runtime when training the network. The average training time of one batch data for different dimensions is shown in Figure 4(b). These two results reflect the complexity of feature extraction. In Figure 4(c), we show the training and testing loss value when setting different feature dimensions. Using softmax as a classifier, we evaluate the performance of the feature extractor by identification accuracy, and the results are shown in Figure 4(d).

Trading off the identification accuracy against the computational complexity, we set the feature dimension to 32. By using this extractor, we can automatically extract features from the raw data samples, and each sample is represented by a 32dimension feature vector,used as the input to the classifier.

Here, we compare our feature extraction method with other representative methods.The widely used method is the traditional hand design feature extractor, which depends on hand-engineered features with some statistical technologies. Note that Moore et al. [26]summarized 248 classes of features for identification. In this comparison, we selected some of the features mentioned in [26] to construct a ten-dimension feature vector, including the data receiving time, data receiving direction,length of the flow, the byte ratio and so on.

Another feature extractor we compared with was a fully connected neural network based feature extractor, which introduced in[27]. The authors used a stacked autoencoder model to learn generic traffic flow features.In the comparison, we trained a 4-layer autoencoder in a greedy layerwise fashion. Then,the same classifier was applied to identify the types of protocols, using the features of the hand design method, autoencoder-based(AE-based) method and our proposed method respectively. Here, we used softmax as the classifier, and the results are shown as follows.

From figure 5, we can see that the identification result is 86.78% for the traditional feature extraction method, the result of AE-based method is 91.06%, and ours is 96.24%.It shows that the accuracy of our method is improved about 10% compared to the hand design method, and more than 5% compared to the AE-based method. By this compari-son, we can see that our proposed 1D-CNN based feature extractor is more efficient than the traditional method. At the same time, the 1D-CNN based method makes the process of feature extraction more intelligently and automatically.

2) Protocol identification: After the above feature extraction, we used its output feature vector as the input to the classifier. In this classifier, we applied SVM to identify five types of protocols. Then, we compared our method with some often used classifiers. Here, we compared three different types of classifiers,including Bayesion [28], Neural Network [29]and SVM. For this comparison, the input to these classifiers were all extracted by using the same 1D-CNN based method, and the training dataset, the validation dataset and the testing dataset were all the same. The results of these three classifiers are shown in Figure 6.

From this figure, we can see that nearly all of the identification results of the three classifiers are above 95%, and these results show the effectiveness of the proposed deep learning based feature extractor. Thus, the 1D-CNN is competent for the task of feature extraction and greatly improve the identification accuracy. The identification result of our proposed method is 98.193%, and is more accurately than the other two methods. Therefore, the effectiveness of our proposed method can be validated.

At last, we combined these two parts for the task of protocol identification, and made a comparison with some state of art protocol identification methods. The first one was the method proposed in [30], which provided 19 optional features and applied the SVM for traffic identification. The second one was a Bayesian-based method introduced in [28].The last one was an AE-based identification method, which introduced in [27]. The results of our proposed method and the above three methods are shown in Figure 7.

From this figure, we can see that the identification result of our proposed method is 98.193%, while the results of other three methods are 87.27%, 93.16%, 93.93%, respectively. The results show that the proposed method outperforms these three state of art protocol identification methods.

Fig. 5. Identification results for different feature extraction methods.

Fig. 6. Identification results for different protocol classification methods.

Fig. 7. Comparison of our proposed method and some state of art protocol identification methods.

IV. CONCLUSION

Protocol identification is one of the most effective method for application identification,which is important for QoE based network management. The performance of protocol identification method is determined by the features and the way to classify the features.Traditional feature extraction methods mostly rely on engineering skill and domain expertise, that consumes much computational time and resource. Considering the disadvantages of the conventional methods, in this paper,we introduced a novel protocol identification method, based on deep learning. We proposed the 1D-CNN method to extract features from raw data, meanwhile, achieving automation in the procedure of feature extraction. After that,SVM was applied to classify five types of protocols, based on the extracted features. Some experimental results are presented in this paper, and the results show the effectiveness of the proposed method in protocol identification of wireless system.

ACKNOWLEDGMENT

We gratefully acknowledge anonymous reviewers who read drafts and made many helpful suggestions. This work was supported by NSAF under Grant (No. U1530117) and National Natural Science Foundation of China(No. 61471022 and No. 61201156).

China Communications

2018年10期