Multi-VMs Intrusion Detection for Cloud Security Using Dempster-shafer Theory

2018-11-29ChakFongCheangYiqinWangZhipingCaiandGenXu

Computers Materials&Continua 2018年11期

Chak Fong Cheang , Yiqin Wang Zhiping Cai and Gen Xu

Abstract: Cloud computing provides easy and on-demand access to computing resources in a configurable pool. The flexibility of the cloud environment attracts more and more network services to be deployed on the cloud using groups of virtual machines (VMs),instead of being restricted on a single physical server. When more and more network services are deployed on the cloud, the detection of the intrusion likes Distributed Denialof-Service (DDoS) attack becomes much more challenging than that on the traditional servers because even a single network service now is possibly provided by groups of VMs across the cloud system. In this paper, we propose a cloud-based intrusion detection system (IDS) which inspects the features of data flow between neighboring VMs,analyzes the probability of being attacked on each pair of VMs and then regards it as independent evidence using Dempster-Shafer theory, and eventually combines the evidence among all pairs of VMs using the method of evidence fusion. Unlike the traditional IDS that focus on analyzing the entire network service externally, our proposed algorithm makes full use of the internal interactions between VMs, and the experiment proved that it can provide more accurate results than the traditional algorithm.

Keywords: Intrusion detection, cloud computing, Dempster-Shafer theory, evidence fusion.

1 Introduction

Cloud security is one of the most important factors in cloud computing [Chonka, Xiang,Zhou et al. (2011); Khorshed, Shawkat and Saleh (2012); Guo, Liu, Cai et al. (2018)].The widely used of virtualization on the cloud environment make the network situation becomes more complicated than before. Many network services now deployed on the cloud are distributed across different virtual machines (VMs). This makes it easier in suffering from distributed attacks and more different in detecting the intrusions.

Intrusion detection system (IDS), used to protect traditional network systems, now can be deployed on the cloud system as one of the most effective methods of protection. In recent years, many improvements on the algorithms of IDS are proposed. Modi et al.[Modi, Patel, Borisaniya et al. (2013)] proposed a collaborative IDS framework for cloud.In this proposed collaborative IDS, cascading decision tree and SVM are used to improve the detection accuracy and the system performance. Vaid et al. [Vaid and Verma (2015)]proposed an IDS using Bootstrapped Optimistic Algorithm for Tree Construction (BOAT)algorithm. This research project is aimed to analyze the user behavior using anomaly detection of malicious activities when unauthorized access or illegal transactions to cloud data occurred. Ficco et al. [Ficco, Tasquier and Aversa (2013)] proposed a distributed intrusion detection architecture, which allows the cloud providers to offer the security solutions as a service. Mishra et al. [Mishra, Pilli, Varadharajan et al. (2017)] proposed a combination of parallelization and machine learning methods, which enhances both the detection mechanism and the detection speed of an IDS. Khan et al. [Khan, Awad and Thuraisingham (2007); Mewada, Gedam, Khan et al. (2010); Chang, Li and Yang (2017)]proposed the improved SVM algorithms for intrusion detection classification.Mukkamala et al. [Mukkamala, Janoski and Sung (2002); Abhaya and Kumar (2016);Bezdek and James (1981)] tried to solve the intrusion detection problem using neural network and fuzzy algorithms. Gul et al. [Gul and Hussain (2011); Singh, Patel,Borisaniya et al. (2016)] proposed some distributed models and collaborative frameworks for intrusion detection in the cloud.

However, there are still many limitations when merely migrating the traditional IDS onto the cloud. For network-based IDS (NIDS), it gives better observation and more resistibility against offending attacks but lacks the knowledge about host system. On the other hand, host-based IDS (HIDS) provides security against the host system but still cannot detect and resist attacks on other hosts or network, and are vulnerable to evasion attacks. More important, the traditional IDS usually focuses on analyzing the entire network service externally but neglects the internal interaction between the VMs.

Therefore, we propose a cloud-based IDS using the Dempster-Shafer theory to overcome the limitation of the traditional IDS. In this system, each VM can observe the malicious activities and independently analyze them using its own IDS algorithm. Then the cloudbased IDS can combine the results of all VMs using the method of evidence fusion. This IDS system makes full use of the internal relationship of the VMs in analyzing the intrusion detection, therefore it can improve the accuracy of the judgment.

The paper is organized as follows. In Section 2, we formulate the problem in intrusion detection. In Section 3, we describe the IDS architecture and its components. In Section 4,we explain the IDS algorithms using the Dempster-Shafer theory. In Section 5, we discuss the simulation experiment of the multi-VMs IDS. Finally, we make a conclusion about our work in Section 6.

2 Problem formulated

In cloud computing systems, intrusion detection algorithms are used to recognize intrusion activities by monitoring the network traffic and the abnormal events, and the measurement of intrusion activities is regarded as evidence. Distributed intrusion detection system can obtain the evidence from individual observer and provide a numerical procedure for combining multiple pieces of evidence from different hosts.Thus the intrusion detection is essentially a kind of pattern classification problems.

There are many methods for combining evidence, such as simple majority voting, simple majority decision rule, averaging the observers’ numerical evidence and etc. Among most

However, one difficulty in Bayesian approach is the requirement to know the priori probability in the absence of any evidence, because it requires complete knowledge of both prior and conditional probabilities.

Dempster-Shafer theory is considered to be an extended Bayesian inference. The Dempster-Shafer theory of evidence, originated by Dempster [Dempster (1976)] and later revised by Shafer [Shafer (1976)] addresses this situation by representing uncertainty in the form of belieffunctions. It offers a mathematical way to combine evidence from multiple observers without the need to know about a priori or conditional probabilities as in the Bayesian approach. It has solved the problem of analyzing the uncertainty in a quantitative way by representing them using belieffunctions. Therefore, when it is used in the distributed intrusion detection, Dempster-Shafer theory can produce the results as malicious intrusions or normal activities with an unknown bias.

3 Architecture of multi-VMs IDS

When deploying the network services on multiple VMs across the cloud system, it is vulnerable to a variety of malicious attacks and is difficult to detect it. For this purpose,we design an architecture of a cloud-based IDS, in which individual VM can observe part of the total traffic, and make distributed intrusion detection.

3.1 Network services in IDS

This section will explain the components of proposed IDS architecture, and provide the definition of each component. As shown in Fig. 1, our proposed IDS is expected to use on a cloud system that consists of multiple physical servers. Each physical server is indicated as:

In each physical server, there are some virtual machines (VMs) hosted inside it. Each VM is indicated as:

Where i is the index of the physical server, and j is the index of VM in this PSi.

We supposed that a specific network service deployed on the cloud can be distributed across different VMs. The provision of a specific network function is actually provided by the data flow throughout different VMs. For example, a given network service can be a typical network function designed for database enquiry and is implemented in server/client mode. This network service might consist of multiple servers, such as a web server, an authentication server, a database server, and some other servers. Each network service is indicated as:of the proposed evidence fusion methods, Bayesian approach interprets the posteriori probability P(H|E) as a measure of belief about a hypothesis H updated in response to evidence E. Bayesian approach is well grounded in the formalities of probability through the well-known Bayes’ theorem.

VMik,jkcan be simplified as VMk, where k is the index of VMs hosted this network service.

Figure 1: A network service is deployed on multiple VMs across physical servers

3.2 Evidence in IDS

One of major problem in distributed intrusion detection is to define the trustworthiness of the hosts and to combine the observational evidence from multiple hosts. In our proposed IDS, evidence is the likelihood of being attacked on a VM, which is generated by the existing intrusion detection algorithm, such as k-nearest neighbor (KNN), support vector machine (SVM), decision tree (DT) and etc. The evidence of each VM is indicated as:

Therefore, the evidence of a specific network service is the likelihood of being attacked on either of VMs, and it is the fusion of evidence obtained by multiple VMs. The evidence of a specific network service is indicated as:

For a specific network service, the data flow is the traffic of messages that created by the end users and is continuously forwarded throughout all VMs involved in this network service. More precisely, each message is forwarded from the first VM hop by hop until it reaches the last VM. During the forwarding process, the message might be changed from VM to VM, in order to provide necessary function to the end users.

Considering the fact that any message is possibly changed at any VM during the forwarding, but the message always remains the same after it is sent by the previous VM and before it is received by the next VM. The evidence is generated on these two different VMs based on the same snapshot of the observed message. Therefore, two neighboring

VMs can be grouped as a pair of observers for evidence fusion first, and it is reasonable to break down the procedure of evidence fusion into two levels, as shown in Fig. 2.

The evidence fusion is designed in two levels.

At the first level, the VM-based evidence is fused. The evidence of each pair of VMs is generated by combining the evidence of two neighboring VMs, and is indicated as:

At the second level, the Cloud-based evidence is fused. The evidence of all VMs is generated by combining the evidence of all pair of VMs, and is indicated as:

Figure 2: Evidence fusion in multi-VM IDS

4 Algorithms of Multi-VMs IDS using Dempster-Shafer theory

Detection accuracy is an issue for any intrusion detection system. When estimating the likelihood of an intrusion from multiple hosts, the decision of the individual host might not be reliable. The Dempster-Shafer theory of evidence is well suited for this type of problem because it reflects uncertainty.

4.1 Evidence definition

The definition of evidence is the problem of determining initial estimates of hosts’trustworthiness. In our proposed IDS, the evidence can be generated by using any existing intrusion detection algorithms. Sometimes, we can compute an initial estimate of the hosts’ trustworthiness by combining multiple classifiers such as k-nearest neighbor(KNN), support vector machine (SVM), random forest, decision tree (DT) and supervised learning in quest (SLIQ), because these techniques have low false alerts, better accuracy and low computation cost.

First, we define Ω as all possible types of malicious attacks on the cloud. Here, Ω is a collection of mutually exclusive and finite elements. Each element in the set represents one type of malicious attack. Various types of DDoS attacks, e.g. TCP SYN flood, UDP flood, ICMP flood and etc. are indicated as:

The set of all subsets of Ω is called power set of the malicious attacks, denoted as 2Ω.

Where Aiis a member of the power set, and it is the subset of all malicious attacks.

Second, the mass function m() is defined to measure the likelihood of any malicious attacks.

To obtain the value of m(Ai), each VM captures the features of the data flow received.The flow features usually include the types of packets, the addresses of the source hosts and destination hosts, the parameters in the fields of the headers and etc. Each VM then independently analyzes these features using its own intrusion detection algorithms like knearest neighbor (KNN), support vector machine (SVM), decision tree (DT) and etc.

We use the value of m(Ai) to express only the possibility of the attacks defined in the set Aithat might contain multiple attacks, but cannot distinguish the possibility of each attack in the subset of Ai, which is quite reasonable when multiple attacks share the similar flow features in the data flow. It is particularly useful in case we cannot distinguish more specific evidence among different types of attacks in some situations.

Sometimes, multiple intrusion detection algorithms may apply concurrently on a single VM in order to better identify the types of the attacks. Each intrusion detection algorithm is able to have individual mass function ml(A). For this situation, we can evaluate the overall mass function m¯(A) by introducing a weight on every individual mass function m(A).

4.2 Evidence fusion

The evidence is then combined by using the evidence fusion of Dempster-Shafer theory.Dempster’s rule for combination gives a numerical procedure for fusing together multiple pieces of evidence from unreliable observers. Evidence Fusion is implemented through two stages.

At the first stage, the VM-based evidence is fused on each pair of the VMs. We suppose two neighboring VMs generate their evidence E VMik-1,jk-1and E VMik,jkrespectively, as shown in Fig. 3. According to Eq. (7) and Eq. (15):

At the second stage, the cloud-based evidence is fused among all VMs. According to Eq.8 and Eq. (17):

Figure 3: Evidence fusion

4.3 Evidence judgment

Evidence judgment is implemented through two functions. To evaluate the evidence, we define belieffunction and plausibility function. The belieffunction represents the weight of evidence supporting one’s probability. The plausibility function is the weight of evidence that does not refute this one.

The belieffunction is the lowest bound of possibility of the malicious attacks detected on the VMs.

The plausibility function is the highest bound of possibility of the malicious attacks detected on the VMs.

Therefore, the evidence interval is consisted of Bel(Ai) and Pls(Ai). We call Bel(Ai)the lower limit and Pls(Ai) the upper limit. The interval [Bel(Ai),Pls(Ai)] indicates the uncertain of the judgment for the malicious attack Ai, as shown in Fig. 4. A large value of Pls(Ai)- Bel(Ai) indicates the degree of not clear whether the set of attack Aiis true or false.

At last, the evidence judgment function f(Ai) is defined to evaluate the possibility of intrusion, and is indicated as:

It shows how probable the attack Aiis.

Figure 4: Lower bound and upper bound of evidence judgment

5 Experiment

We designed an experiment scenario to evaluate the performance of our proposed algorithm on the cloud. We built up our proposed multi-VMs IDS by creating 3 instances of VMs (VM1, VM2and VM3) on the open source cloud platform, OpenStack. An attacker and a normal user from the external network, implemented by two packet generator programs, generate the TCP SYN flood and the HTTP request traffics concurrently to these 3 VMs located in the internal network. The KNN algorithm is used as the default algorithm to determent the initial estimates of the evidence (m1, m2and

m3) of each VM.

During the testing, we collected the network traffic observed on the VMs over a period of time. The statistical tests of KNN are then carried out on the observed traffic to determine whether that behavior is a known attack or not.

Comparing the results between our proposed algorithm of 2-level evidence fusion and the traditional algorithm that uses one way fusion, our results have higher successful rate in classifying the malicious attack as intrusions, because the two-level evidence fusion are introduced to our algorithm for better estimating the probability of an attack on each pair of the VMs. The simulation results are shown in Tab. 1.

Table 1: Simulation results of evidence fusion

6 Conclusions

The virtualization nature of the cloud provides the flexibility for deploying network service across different VMs, but this also makes it susceptible to the distributed attack.We proposed a multi-VMs intrusion detection framework, in which each VM observes and analyzes the evidence independently with its own detection algorithm, but makes the collaborative intrusion decision with other VMs. The design of two-level evidence fusion,both VM-based level and Cloud-based level, allows the potential interaction between VMs, usually neglected by other collaborative models, to be processed now in our model and thus can provide more accurate results.

Acknowledgement:This work was supported by the Macau Science and Technology Development Fund under Grant No. 096/2013/A3 and 0026/2018/A1, and the Macau Fund.

Computers Materials&Continua

2018年11期