Secure Mobile Crowdsensing Based on Deep Learning

2018-10-13LiangXiaoDonghuaJiangDongjinXuWeiSuNingAnDongmingWang

China Communications 2018年10期

Liang Xiao＊, Donghua Jiang, Dongjin Xu, Wei Su, Ning An, Dongming Wang

1 Dept. of Communications Engineering, Xiamen University, Xiamen 361005, China

2 Dept. of Computer and Information, Hefei University of Technology, Hefei 230009, China

3 National Mobile Communications Research Lab., Southeast University, Nanjing 211189, China

Abstract: To improve the quality of multimedia services and stimulate secure sensing in Internet of Things applications, such as healthcare and traffic monitoring, mobile crowdsensing (MCS) systems must address security threats such as jamming, spoofing and faked sensing attacks during both sensing and information exchange processes in large-scale dynamic and heterogeneous networks. In this article, we investigate secure mobile crowdsensing and present ways to use deep learning(DL) methods, such as stacked autoencoder,deep neural networks, convolutional neural networks, and deep reinforcement learning, to improve approaches to MCS security, including authentication, privacy protection, faked sensing countermeasures, intrusion detection and anti-jamming transmissions in MCS. We discuss the performance gain of these DL-based approaches compared to traditional security schemes and identify the challenges that must be addressed to implement these approaches in practical MCS systems.

Keywords: mobile crowdsensing; security;deep learning; reinforcement learning; faked sensing

I. INTRODUCTION

Mobile crowdsensing (MCS) applies embedded sensors, such as gyroscopes, accelerometers, microphones and global positioning systems (GPSs), in mobile devices such as smartphones to provide quality of experience(QoE) based multimedia services in location-based Internet of Things (IoT) applications, including traffic monitoring, healthcare,catering recommendation and social networks[1], [2]. The participants recruited by an MCS platform monitor the surrounding features and upload sensing reports to MCS servers that, in turn, extract the information of interest from the sensing reports. Security and privacy are critical to MCS systems, as mobile devices are controlled by selfish and autonomous users who can launch insider attacks such as faked sensing attacks and raise privacy concerns; additionally, transmissions over radio networks,such as 3G/4G, Wifiand vehicular ad-hoc networks (VANETs), are vulnerable to security threats, such as jamming, distributed denial of service attacks (DoS), spoofing attacks,Sybil attacks, faked sensing attacks and smart attacks [3]–[5].

In this article, we investigate the security and privacy challenges of mobile crowdsensing, such as sensitive data leakage and faked sensing attacks, and review MCS security solutions, such as authentication, malware detection and data anonymization to protect user privacy and enhance sensing accuracy in heterogeneous dynamic MCS systems. It is challenging to design and implement security solutions in practical MCS systems due to the difficulty of estimating the mobile user’s mobility and sensing model in dynamic and heterogeneous networks with insider attackers and dynamic network traffic. Deep learning(DL) techniques have received significant attention in research on speech recognition,computer vision and network security. A DL based security technique usually contains a neural network such as a CNN. A typical CNN has several convolutional, pooling and full connected layers with activation functions and thousands of parameters, which can be updated according to the stochastic gradient descent(SGD) method [6]. By applying DL techniques with multi-layer structures, the large network state space and extensive training data can be compressed to accelerate learning,optimize the feature extraction, and address the “high-dimensional disaster” [7]. In recent years, DL-based security schemes, such as the privacy protection scheme, the authentication scheme and the malware detection scheme,outperformed the benchmark deterministic schemes [8], [9]. In this article, we focus on MCS security techniques based on deep learning algorithms, such as stacked auto-encoder(SAE), deep neural network (DNN), convolutional neural network (CNN), recursive neural network (RNN), deep belief networks (DBN),deep Boltzmann machine (DBM) and deep Q-networks (DQN).

DL-based MCS security techniques compress the state space observed by the MCS servers and users and accelerate their learning to address attacks compared to the benchmark MCS systems without deep learning [10].More specifically, by using the multi-layer feature extractors of deep learning, DL-based secure MCS systems derive more abstract features from the high-dimensional MCS data,such as the user behaviors, sensing reports and network states, to improve their sensing performance and reduce the learning time. Deep learning also helps MCS servers build a trust architecture to protect private data, improve sensing quality, increase the utility of MCS systems, and protect user privacy against attacks.

We focus on the DL-based authentication,privacy protection, faked sensing countermeasures, intrusion detection and anti-jamming transmissions to address serious security threats, protect user privacy and enhance sensing accuracy in MCS. The performance gain of these DL-based approaches compared to traditional security schemes is presented and shows that deep learning techniques can accelerate malware detection, enhance authentication accuracy, protect data privacy, reduce the faked sensing attack rate, and resist jamming attacks in sensing report transmissions.We discuss the challenges and identify future directions to implement DL-based security approaches in practical MCS systems, such as the real-time security process, accurate evaluation of the MCS utility and backup security solutions.

This article is organized as follows. In the next section, we review the MCS attack models. We then describe DL-based authentication,privacy protection, faked sensing countermeasures, intrusion detection, and anti-jamming transmissions. Finally, we identify possible directions for future research.

Mobile crowdsensing systems must address stealthy and insider attacks launched by both selfish mobile users and malicious attackers during the data sensing and information exchange processes to guarantee the veracity and security of multiple MCS applications.

II. THREAT MODEL IN MOBILE CROWDSENSING

Mobile crowdsensing systems consist of mobile users (participants), base stations (BSs),access points (APs), and an MCS server, all of which are vulnerable to attacks and security problems. More specifically, malicious attackers in MCS aim to degrade the MCS service level, crash the MCS system, or steal private data. MCS systems are also vulnerable to insider attacks launched by selfish mobile users whose goal is to obtain additional payments and secrets at lower sensing costs and risk of private data leakage. Smart attackers use machine learning techniques and smart radio devices to choose the attack policy and pose more dangerous threats to MCS systems. As illustrated in figure 1, the mobile users recruited by an MCS platform monitor the surrounding features and upload sensing reports to MCS servers via AP/BS. MCS servers extract the information of interest from the sensing reports to provide multimedia services, such as traffic monitoring, healthcare, catering recommendation and social networks. Mo- bile crowdsensing systems must address stealthy and insider attacks launched by both selfish mobile users and malicious attackers during the data sensing and information exchange processes to guarantee the veracity and security of multiple MCS applications. We briefly review several significant types of threats in MCS in what follows.

Spoofing:A spoofer uses the identity of another mobile device, a BS or an AP to obtain illegal access to the MCS system and, furthermore, launch other attacks on the AP or a BS,such as man-in-the-middle and DoS attacks[8].

Sybil:A Sybil user sends sensing reports to the server via APs or BSs with a large number of different user identities to change the sensing result obtained by the majority-based MCS server [1].

Privacy leakage:MCS servers in the cloud must protect user privacy, especially the users’locations and personal information. Rogue users who are interested in private information and stealing sensing data during transmissions or from cloud resources significantly discourage other mobile users from participating in the MCS tasks.

Faked sensing attacks:Selfish mobile users sometimes submit under-estimated or faked sensing reports to the MCS server to reduce the sensing effort and protect privacy[11].

Malware:Malware such as viruses, worms and spy tools seriously threatens MCS in terms of privacy leakage, economic loss, and network performance degradation [9]. Mobile users with limited power and computing capacity are vulnerable to malware.

Jamming:A jammer injects faked or replay signals to interrupt the ongoing transmission of sensing data and payments between mobile users and the MCS servers. A smart jammer can flexibly change the jamming power and frequency according to the transmission status.

DoS:As a typical type of flooding attack,DoS attackers aim to interrupt MCS services by exhausting MCS server resources [1].

Advanced persistent threat (APT):APT attackers launch sophisticated, stealthy, continuous and targeted attacks to steal private information from MCS servers or mobile devices over an extended period of time, causing privacy leakage in MCS networks.

Smart attacks:By applying smart radio devices such as universal software radio peripherals (USRPs), an attack can use machine learning techniques to investigate the defense policy and choose the attack policy accordingly to attack the radio transmissions between mobile devices and APs or BSs.

Fig. 1. Threats in mobile crowdsensing systems during the data sensing and information exchange processes.

III. DL-BASED AUTHENTICATION

Authentication verifies the identities of mobile users participating in MCS tasks to address spoofing and Sybil attacks and protect their data privacy [8], [23]. The PHY-layer authentication scheme in [24] designs a hypothesis test to compare the radio channel responses of the transmitter being tested to the channel record of the claimed node based on a test threshold determined by the given radio channel model and the network model. However, it is challenging to determine the test threshold in time-variant MCS systems with a large number of mobile and heterogeneous devices. Therefore, the DNN-based authentication scheme is developed by [8] without requiring the test threshold.

As illustrated in Table 1, this authentication scheme captures the unique physical features of Wifisignals during the daily activities of mobile users to identify them using the DNN method and develop the support vector machine (SVM) with the generated DNN abstractions to detect spoofing attacks. More specifically, the system extracts 6 time domain features, such as maximum, minimum and skewness, and 3 frequency domain features,including spectrogram energy, percentile frequency component and spectrogram energy difference, from both the amplitude and the phase of the channel responses of Wifisignals. A three-layer DNN based on an autoencoder provides high-level abstractions of human physiological characteristics, such as body shape, height and weight, and of behavioral characteristics, such as the walking patterns of mobile users.

Table I. Summary of dl-based mcs security methods.

According to the simulation results shown in [8], the performance of the DNN-based authentication scheme can accurately detect spoofing attacks in typical scenarios. For example, the average accuracy of the scheme is 91.2% for 7 users, with a standard deviation of 3.67%, in an office environment. However, this authentication scheme relies on the tedious and time-consuming feature and attribute selection, making it difficult to apply in practical MCS systems.

To accelerate the MCS authentication process, a spoofing detection system in [12]applies the SAE to capture the features with a neural network (NN) consisting of multiple layers of sparse autoencoders. The first layer learns the first- order features from the network traffic, and the following layers learn the features corresponding to the patterns from the previous order features. The first hidden layer in the SAE consists of 256 neurons, the second hidden layer has 128 neurons, and the third hidden layer contains 64 neurons. By using the parametric rectified linear unit (ReLU)activation function, the 3-hidden-layer SAE-based authentication scheme achieves accurate and balanced classifications for legitimate mobile users, distinguishing them from flooding, injection and spoofing attackers. For example, this SAE-based authentication scheme achieves a spoofing detection accuracy of 98.5%, which is 14 times greater than that of the J48-based authentication algorithm, which has a spoofing detection accuracy of 6.4%.

IV. DL-BASED PRIVACY PROTECTION

Privacy protection for transmitting and storing sensing reports in cloud servers is essential for practical MCS systems. Data anonymization methods, such as k-anonymity, l-diversity and t-closeness [3], provide anonymous quasi-identifiers and protect sensitive information in the database used by MCS. However, the sensing reports contain high-dimensional data streams, such as images and videos, and thus cannot be protected by such traditional data anonymization schemes due to the difficulty of defining the quasi-identifiers and sensitive attributes.

Therefore, a DNN-based flexible framework, as proposed in [14], uses DNN training at mobile devices to provide data privacy protection. This scheme consists of a local NN platform that extracts features and a NN at the cloud server that applies the released features to train the given MCS task. The local NN,which consists of 4 convolutional layers and 2 max-pooling layers, provides data anonymization for MCS systems. The trade-off between user privacy and sensing accuracy depends on the number of DNN layers and the output channel depth. The simulation results provided in [14] show that the privacy level and utility of the DNN-based MCS system decrease with the number of layers and increase with the output depth in this framework.

Fig. 2. A DQN-based MCS payment scheme to suppress the faked sensing rate by evaluating the sensing quality of mobile users.

Fig. 3. Performance of the DQN-based dynamic MCS system with 60 mobile users, 2 sensing accuracy levels, 11 feasible payments and the evaluation error rate of the MCS server equal to 0.1.

On the other hand, as the feature extraction network (FEN) in [14] depends on the pretrained NNs in the local platform, attackers can predict the structure and weights of the NN in a MSC system. Therefore, a pool of pre-trained NNs enables FEN derivation from various pre-trained NNs to protect the MCS user anonymity. In addition, the channel selection process, which consists of selecting the output and intermediate channels, is difficult for attackers to discover and copy.

A CNN-based distributed framework, as developed in [15], uses the Siamese network that is widely used by verification applications to optimize the cost function, which consists of data privacy and sensing task accuracy,and applies the CNN to improve the image classification accuracy at the MCS servers.In this framework, a feature extractor at the mobile device and an image classifier at the MCS server cooperate to perform MCS tasks such as emotion detection. Because only convolutional layers are implemented on mobile devices, the memory overhead at the mobile device decreases significantly; thus, both the model initialization time and the MCS power consumption are reduced significantly. By removing the undesired information from the extracted features at the mobile device, the framework improves the performance of MCS tasks, especially gender classification and emotion detection, and prevents user identity leakage due to face recognition at the MCS server. For instance, the face recognition accuracy of the CNN-based framework with a Siamese network is 2.6%, which is 94.5% lower than that of the framework without a Siamese network, but the emotion detection accuracy only decreases by 20%.

The DNN-based privacy protection method, as proposed in [13], applies varying levels of Gaussian noise to the training parameters to provide differential privacy, which decreases the sensing accuracy. To balance privacy preservation and sensing accuracy, a coalition MCS incentive mechanism, as designed in [3],evaluates the user payoff and compares it to a threshold that currently is simply set to zero.However, such a test threshold is not always optimal and can even fail to balance privacy protection and sensing accuracy in dynamic MCS networks. Therefore, deep learning techniques such as CNN can be applied by the mediator to achieve the optimal threshold and thus improve the sensing accuracy and user privacy.

V. DL-BASED FAKED SENSING COUNTERMEASURES

Trust mechanisms that were previously designed for VANETs and peer-to-peer networks, such as the anonymous reputation management mechanism presented in [24],are not applicable to an MCS with a variety of participants and large- scale sensing data [25],causing a long sensing latency and low detection accuracy. To this end, deep learning can reduce the dimensionality of the sensing data set and accelerate the learning of MCS.

As shown in figure 2, a DQN-based secure MCS system, as proposed in [11], suppresses the faked sensing motivation of autonomous mobile users in dynamic secure mobile crowdsensing games. In this framework, the server first determines and broadcasts its payment policy to mobile users in the areas of interest.Each mobile device chooses its sensing effort according to the payment policy and its current battery level and receives the payment according to the sensing quality evaluated by the MCS server. The MCS payment process is independent of the previous states and actions,for a given system state and payment strategy in the current time slot in a dynamic game and thus can be formulated as a Markov decision process. Therefore, the MCS server can apply reinforcement learning (RL) techniques such as Q- learning to obtain the optimal MCS policy for mobile users who might launch faked sensing attacks and use a deep CNN that consists of two convolutional (Conv) layers and two fully connected (FC) layers to compress the state space of RL and thereby accelerate the learning process. The CNN parameters are updated according to the SGD method,which samples a subset of summand functions at each iteration to reduce the computational cost.

As shown in figure 3, this MCS payment strategy outperforms the benchmark non-DL scheme with a higher sensing quality, a lower faked sensing rate and a higher utility of the MCS server since the CNN of the proposed scheme compresses the state space and accelerates the learning process. For instance, the DQN-based MCS payment scheme only uses 2000 time slots to suppress the faked sensing rate to 10%, which is 50.0% faster than the Q-learning based strategy. The proposed payment scheme improves the sensing quality and the utility of the MCS server by 31.5% and 96.0%, respectively, after 200 time slots compared with the Q-learning-based scheme.

VI. DL-BASED INTRUSION DETECTION

In MCS systems, mobile devices with limited power, computing capacity and wireless bandwidth can use cloud- based intrusion detection to detect malware with fast processing speeds, powerful security services and a large malware database [9], [18]. By applying learning techniques such as Bayesian network and random forest classifiers, the MCS system evaluates the runtime behavior of applications and processes the traces or logs. However, the malware detection accuracy is sensitive to features and attributes, and many of the existing methods, such as the framework proposed in[1], suffer from long detection delays in largescale networks.

An RNN-based malware detection scheme,as developed in [17], uses RNN to capture the domain- and content-based malware characteristics and determine whether to suspend the dynamic detection based on the network behavior. As shown in figure 4, the malware detection consists of feature extraction, RNN construction, training and classification based on the domain and content features, such as the IP address of the source node, the types of files and the number of bytes. With accurate classification performance, especially when processing natural languages, an RNN can analyze the latent functions of malware communications, such as the malware infection spread and information leakage. Simulation results show that this scheme saves detection resources and accelerates the response compared to the benchmark scheme without deep learning due to the high flexibility of RNN.RNN-based malware detection outperforms the statistics-based malware detection with a 67.1% shorter analysis time, reducing the number of MCS servers needed for efficient MCS dynamic analysis by more than half.

MCS systems can also apply deep learning to optimize the resource allocation to detect APT attacks [19], and the offloading policies without being aware of the radio channel model and the APP traces generation model in malware detection [9]. For instance, a cloudbased mobile malware detection scheme, as proposed in [9], applies DQN to determine the offloading rate of a mobile device. More specifically, the offloading rate is chosen according to the quality function of the offloading strategy and the current state, which consists of the current radio bandwidth and the previous offloading rates of mobile devices. The Q-function is estimated according to the CNN,which consists of two Conv layers and two FC layers. The CNN parameters are updated according to the SGD method, which minimizes the mean-squared error of the target values via minibatch and the experience replay techniques. The simulation results show that the DQN-based scheme applies the CNN to compress the state space and thus accelerate the learning speed, thus improving the malware detection accuracy and reducing the detection time compared to the RL-based benchmark scheme without DL. For instance, the DQN-based scheme increases the detection accuracy by 24.5%, reduces the detection delay by 35.3%, and increases the utility of the mobile device by 31.0% compared with the scheme based on Q-learning.

Fig. 4. RNN-based intrusion detection process with 3 steps.

VII. DL-BASED ANTI-JAMMING TRANSMISSION

MCS systems must address jamming attacks and DoS attacks and improve the transmission efficiency of the sensing data and the payment decisions with high signal-to-noise- plus-interference (SINR), low bit error rates (BERs) and low transmission energy consumption against jamming. However, traditional anti-jamming techniques such as frequency hopping and direct-sequence spread spectrum are not always applicable to MCS systems with heterogeneous and dynamic large-scale networks,which causes a loss of sensing data and payments that can discourage mobile users from participating in the sensing tasks. Therefore,a power allocation scheme, as investigated in[22], combines Q-learning and CNN to suppress the attack motivation of smart attackers,accelerate learning and improve the security capability of MCS systems.

Fig. 5. DQN-based anti-jamming transmissions with 2 smart jammers and 128 radio channels.

A promising approach to addressing jamming attacks in MCS systems is the DQN-based two-dimensional anti-jamming communication system proposed in [7]. As shown in figure 5, this scheme applies frequency hopping to resist jamming and improve the SINR of the received signals. Deep learning techniques such as CNN are used to help such MCS system to balance the sensing and payment transmission quality with the communication overhead such as the energy consumption and latency of channel hopping and user mobility. This scheme enables a mobile device to obtain the optimal communication policy according to the current transmission state without being aware of the jamming and interference model and the radio channel model in MCS systems. The CNN is used to compress the state space for large-scale MCS of mobile and heterogeneous devices and thus accelerate the learning and increase the SINR of the received signals.

The simulation results provided in [7] show that the DQN- based anti-jamming communication system has a faster convergence rate and a higher SINR than Q-learning does. For instance, this scheme converges to the optimal performance after 1000 time slots and saves 66.7% of the learning time compared with the benchmark scheme. The SINR of the DQN-based communication system is 8.3% higher than that of Q-learning after 1000 time slots.

DQN has also been used in a cache-enabled anti-jamming communication scheme, as developed in [21], to optimize the interference alignment and user selection. This scheme pro- vides user cooperation in the design of the precoding matrices to resist jamming.Implemented with Google TensorFlow, this communication system can work in practical MCS with time- varying radio channels. By utilizing a target CNN network, this anti-jamming communication system periodically updates the values of the Q-network according to the outputs of the target network to address the destabilization problem. The learning rate and minibatch size of the CNN are discussed in terms of the convergence time, the memory cost and the sensing performance. For example, this scheme has a faster convergence rate with a smaller minibatch size, which might increase the risk of a local optimum. Therefore, an appropriate minibatch size should be chosen properly before being implemented in MCS systems.

VIII. CONCLUSION & FUTURE WORK

In this paper, we have reviewed the attack models in MCS systems and investigated the DL-based authentication, privacy protection,faked sensing countermeasures, intrusion detection, and anti-jamming transmissions for multimedia services in IoT applications. The DL-based security techniques are promising for improving the QoE-based multimedia services and the security performance of MCS systems. Several important directions for future studies of DL-based MCS security solutions are suggested below.

Real-time processing:Most existing deep learning-based security schemes, such as the DNN-based authentication in [8] and the RNN-based malware detection in [17], require long training times and are too complicated to implement in practical MCS systems for real-time processing. The hardware that is widely used for deep learning computation,such as graphics processing units (GPUs), is not applicable to most mobile devices such as smartphones in MCS systems. There- fore,distributed deep learning algorithms that distribute the processing and storage tasks among MCS network components and low-cost DL-compatible chips are critical to protecting MCS systems in the future. Another interesting topic is to optimize the deep learning architecture and parameters of MCS security solutions, such as the second-order stochastic gradient descent, which helps accelerate the sensing of MCS systems’ attack detection.

Accurate evaluation of utility:Deep learning-based security schemes, such as the CNN-based sensing data anonymization in [14] and the DQN-based faked sensing countermeasure in [11], must evaluate the security performance according to the utility function, which depends on the instantaneous security gain and protection cost. For example, an MCS server has to estimate the sensing accuracy, computation cost and protection cost in each time slot to determine the payment policy and suppress the faked sensing motivations of mobile users in the DQN-based MCS system in [11]. However, it is challenging for the MCS system to accurately evaluate such factors and determine the utility in time, especially in the case of stealthy and insider attackers. In the future,we have to investigate the deep learning algorithms that are robust to the utility evaluation errors to design MCS security solutions.

Backup security solutions:The existing DL techniques, such as the DQN-based malware detection with offloading in [9], require a long training stage and a trial-and-error stage in the beginning, indicating MCS security failures to protect against attacks that can cause network disasters and losses in the millions of dollars. Therefore, backup security protocols have to be incorporated in the DL-based security schemes to provide reliable and secure MCS services.

ACKNOWLEDGMENT

This work was supported in part by the National Natural Science Foundation of China under Grant 61671396 and 91638204, in part by the open research fund of National Mobile Communications Research Laboratory,Southeast University (No. 2018D08), and in part by Science and Technology Innovation Project of Foshan City, China (Grant No.2015IT100095).

China Communications

2018年10期