APP下载

Detecting the One-Shot Dummy Attack on the Power Industrial Control Processes With an Unsupervised Data-Driven Approach

2023-03-09ZhenyongZhangYanQinJingpeiWangHuiLiandRuilongDeng

IEEE/CAA Journal of Automatica Sinica 2023年2期

Zhenyong Zhang,Yan Qin,Jingpei Wang,Hui Li,and Ruilong Deng,

Dear Editor,

Dummy attack (DA),a deep stealthy but impactful data integrity attack on power industrial control processes,is recently recognized as hiding the corrupted measurements in normal measurements.In this letter,targeting a more practical case,we aim to detect the one shot DA,with the purpose of revealing the DA once it is launched.Specifically,we first formulate an optimization problemto generate one-shot DAs.Then,an unsupervised data-driven approach based on a modified local out lier factor (MLOF) is proposed to detect them.To improve the detection performance,the measurements are preprocessed with the gamma transformation and the power patterns are extracted from historical data and integrated into the MLOF algorithm.Finally,extensive experiments are conducted to evaluate the performance of the proposed approach with real-world load data.

The industrial control processes of power systems are enabled by integrating advanced information and communication technology(ICT) into the physical infrastructures.However,the vulnerabilities exposed in ICT devices make power systems attractive to cyberattacks [1].The false data injection attack (FDIA) [2] is one of the cyberat tacks that aims to stealthily inflict the power system’s control processes (e.g.,state estimation,load frequency control,etc.).However,the bad data detection-bypassed FDIA is no longer stealthy if the detector evaluates the distances between the corrupted measurements and the near historical measurements.Thus,a deep stealthy attack,the DA is recently proposed to hide the corrupted measurements in historical measurements [3].The DA is a variant of FDIA by enhancing the stealthiness against detectors using clustering methods.

As an emerging research topic,it is still an open issue to detect DAs effectively.The revisit of existing defense against traditional FDIAs [4] is beneficial to understanding DA.Among them,the measurement/state protection [5] is the most studied approach,which prevents the power system from being attacked by securing some critical measurements.But,the shortage of this approach is that only a few measurements can be trusted and the real-time operation might be affected if the data is protected with encryption.To provide a dynamic protection capability,the moving target defense approach[6] is recently put forward to thwart FDIAs by dynamically changing the network and physical parameters of power systems.But the active perturbation of system parameters might affect the safety and stability of the voltage,grid frequency,etc.To alleviate the impact of defensive approaches,the data-driven methods are promising alternatives to detect and identify FDIAs [7] and [8].However,there are three challenges for the data-driven approaches to detect FDIAs.First,it is impossible to completely label the adversarial data for supervised approaches since the state space is uncountable.The same measurement might be contradictorily labeled since the modified measurements by FDIA sometimes are legal (i.e.,satisfying the physics laws).Second,the assumption regarding specific distribution for a tested dataset,requested by some approaches,may not hold in practice,as different power systems are likely to yield different data distributions.Third,some complicated machine/deep learning methods have plenty of parameters that should be adjusted according to different power systems.

In this letter,our goal is to detect DAs with a data-driven approach to fill the gap.We pay a special emphasis on the one-shot DA.The one-shot setting is more reasonable because: from the attacker's perspective,he/she aims to destroy the system’s control processes by using only one hit;while from the defender’s perspective,it is the best to detect the attack at once after its execution.However,there are challenges to detecting the one-shot DA.First,prior know ledge about the one-shot attack is lacking,as it is rare during the system’s operation.Second,since the DA is executed in a one-shot manner,it is almost impossible to monitor the long-termchange of measurements since the system might have already crashed down.To address these issues,in this letter,we propose an unsupervised data-driven approach based on the local out lier factor (LOF) to detect one-shot DAs.According to our know ledge,it is the first work to propose an efficient algorithm to detect one-shot DAs.The main contributions are summarized as follows: 1) We present a formulation of the constrained one-shot DA;2) The interval information andγtransformation method are adopted to enlarge the deviation of abnormal measurements;3) A modified LOF algorithm is proposed to enable the efficient detection of one-shot DAs;4) The performance of the proposed approach in terms of detecting one-shot DAs is evaluated with the real-world load data.

Problem formulation:Here,we consider a power transmission network that has a set N of buses and a set L of transmission lines.The DA is constructed to bypass the bad data detector and minimize the distance between the malicious and normal measurements.To compute a malicious measurementz′for DA,the following problemshould be solved,which is:

wherez1,z2,...,z rarerhistorical measurements,∥·∥ denotes the norm operation (ℓ2normis used here),z iis the measurement collected at time instanti,1|N|is anN-dimension vector with all elements equal to 1,∆p dis a vector of maliciously injected loads of original loadsp d,σiis theith element of vectorσ,∆flis the injected error into the power flow measurement of branchl,is the maximum tolerable power flow of branchl,δlis the overloading threshold of branchl,∆fis a vector of maliciously injected errors into power flows,ϕ(·) represents the nonlinear relationship between power loads and power flows with the alternating current (AC)model,Ndis a set of loads that are corrupted by DA,is the number of loads that can be corrupted,andNdis the total number of loads.The first constraint (1a) guarantees the stealthiness of DA.The second constraint (1b) limits the magnitudes of modified loads.The third constraint (1c) reflects the impact of DA on the power system.The fourth constraint (1d) is the stealthy constraint caused by physics laws.The fifth constraint (1e) gives the malicious measurement after DA.The last constraint (1f) limits the capability of the attacker to modify power loads.Due to the nonlinear and nonconvex properties caused by constraint (1d),we linearize it by using the Jacobian matrix:denotes the Jacobian matrix by deriving ϕ(∆p d) with respect top d.Thus,if the targeted loads are determined,the above problem becomes a convex optimization problem that is easy to solve.

The DA is proposed because traditional FDIAs can be easily detected using the clustering-based and principal component analysis (PCA)-based approaches [9],[10].The malicious measurements of the DA are deeply hidden in normal measurements because the distances between them and normal ones are minimized.Besides,the DA can be executed in a one-shot manner and the power system is knocked down in a short time.Thus,the defender needs to detect the attack at the beginning when it starts.Therefore,in this letter,our goal is to detect one-shot DAs in an accurate and efficient way.

Main results:In the following,we propose an unsupervised datadriven approach to detect one-shot DAs based on the LOF,which has been widely used to detect outliers [11].In this letter,the LOF is modified to make it compatible with the detection of one-shot DAs.Since a restricted neighborhood of each measurement point is taken into account,the proposed approach does not require any explicit or implicit notion of clusters.

However,there are challenges to directly applying the LOF algorithm to detect one-shot DAs.First,the collected measurements of power systems are in a form of the high-velocity data stream.Unlike the static dataset,the stream data might have different patterns during different periods.It is possible to mistake the adversarial data as normal although it is illegal in the current period.Second,the LOF algorithm is usually computational intensive since it works by analyzing the data in a global view.Its computational complexity increases with the size of the tested dataset.However,it is not acceptable to detect the one-shot DA for a long time since the power system changes very fast.Therefore,in this letter,we propose a MLOF algorithmto improve the efficiency of detecting one-shot DAs by utilizing the common power patterns extracted from historical measurements.

For example,the power patterns can be estimated according to load profiles and updated in an adaptive way.Here the power patterns are extracted from the real-world load data of New York State1NYISO load data.2021,Available: https://www.nyiso.com/load-data.The load profiles on the date 20-01-2021 are plotted in Fig.1.We can see that there are three patterns (dropping,climbing,steady) for the load change in a day.Although the amount of loads is changed with time,the load-change patterns remain the same for a certain period.Thus,the load-change patterns can be utilized to reduce the computational complexity of the LOF algorithm.

Fig.1.Load-change patterns of the real-world load profiles.

By integrating the power patterns,the proposed MLOF works as follows.First,the normal measurements are divided intoqdata chunks (C1,C2,···,Cq) according toqpower patterns (p1,p2,··· ,pq)(e.g.,dropping,climbing,and steady).Then,for each incoming measurement,the time periodt(t=1,2,...,q) that its belonging is first determined,then thek-nearest-neighbor search is performed only within the data chunk Ct.Based on the estimatedk-nearest-neighbor,the LOF value is computed for all measurements in the data chunk Ct.If the incoming measurement has the largest LOF value,then it is recognized as a malicious measurement of the one-shot DA.The pseudo-code of MLOF can be found in Algorithm1.Note that theqdata chunks should be updated after a period of time since the power patterns might change due to the construction of new power facilities,the penetration of new energy resources,and etc.

The computational complexity of LOF can be formulated asCLOF=O(f(n)),wherenis the size of the tested dataset,f(n) is a monotonically increasing function,and O(·) calculates the computa-CMLOF=O(N),whereNis a constant number,i.e.,the size of the tional complexity.As for MLOF,the computational complexity is data chunk,which is independent of the increasing number of collected measurements.

Experimental results:The performance of the proposed approach in terms of detecting one-shot DAs is evaluated with the IEEE 14-bus power system.To make the tested scenario more practical,the real world load profiles of the New York State from 01-01-2021 to 31-03-2021 are incorporated to generate the attack data.The 11 loads from 11 main regions are used to act as loads of the IEEE 14-bus power system,while the measurements are created based on the load data.The one-shot DA is constructed by using the latestrmeasurements to solve the problem(1).To detect the one-shot DA,the historical measurements from the beginning to timeiare used for LOF,COF [12],LDOF [13],NOF,DBSCAN,and iForest.The measurements on 20-01-2021 are used to extract the power patterns for MLOF.The evaluation metric “TP”means true positive and “FP”stands for false positive.As for LOF,COF,LDOF,NOF,DBSCAN,and iForest,the historical measurementsz1,z2,...,z i−1 are normal,but the measurementz iis attacked by the one-shot DA.The goal of LOF,COF,LDOF,NOF,DBSCAN,and iForest is to capture the abnormal measurementz iwithz1,z2,...,z i.As for MLOF,three data chunks C1,C2,and C3,according to three power patterns (i.e.,dropping,climbing,and steady),are extracted based on the measurements on 20-01-2021.The malicious measurementz iis fed into C1,C2,or C3to test whether it is a malicious measurement or not.The parameterγis set the same for all sensor measurements.

By default,the historical measurements for LOF,COF,LDOF,NOF,DBSCAN,and iForest are collected based on the loads from 01-01-2021 to 21-02-2021,while the historical measurements for MLOF are collected based on the loads on the date 20-01-2021.The malicious measurements are generated based on the measurements on 21-01-2021.The attack parameterris 30.The transformation parameterγis 1.1.All malicious measurements are constructed by solving the problem(1).

First,we compare the performance of MLOF and LOF in terms of detecting one-shot DAs with differentkvalues.The experimental results are shown in Table 1.We find that the detection rate (i.e.,TP rate) of one-shot DAs decreases when thekvalue increases with both MLOF and LOF.But the performance with MLOF is much better than that with LOF,which indicates that the LOF will treat the abnormal measurements as normal since it uses the historical measurements collected in a long term.On the other hand,the FP rates are below 5% with MLOF and below 1% with LOF,which shows that the LOF sacrifices the TP rate to reduce the FP rate.

Table 1.The TP and FP Rates of Detecting One-Shot DAs With MLOF,LOF,COF,and LDOF ( γ=1.1)

We also evaluate the detection performance of one-shot DAs with the variants of LOF.COF is a variant of LOF to address the issue that the outlier has a similar neighborhood density as the normal data [12],while LDOF is extended from LOF to address the low-sensitive issue caused by the scattered real-world data [13].The results are also presented in Table 1.It seems that the variants (i.e.,COF and LDOF) of LOF have better performance in terms of detecting one-shot DAs than that of LOF but worse than that of MLOF.The results indicate that the power patterns are critical for detecting one-shot DAs.

Further,we also evaluate the performance of other algorithms beyond LOF to detect one-shot DAs.NOF is a variant of theK-nearest neighbor method by alleviating the difficulty to select an appropriateKvalue.DBSCAN is a common data clustering algorithm to detect outliers with data noise.Isolation forest (iForest) is a famous anomaly detection method to identify outliers from normal data points by measuring the distance between the data point and the rest data points.The TP rates of detecting one-shot DAs with the evaluated algorithms are presented in Table 2.We find that the TP rates with the NOF,DBSCAN,and iForest algorithms are lower than that with the variants of the LOF algorithm.In our opinion,this is because the evaluated algorithms are developed based on the distances between the outliers and normal data points,while the LOF algorithm is derived based on density estimation.

Table 2.The TP and FP Rates of Detecting One-Shot DAs With NOF,DBSCAN,and iForest ( γ=1.1)

Second,we compare the running time of LOF and MLOF with the same platform.The algorithms are computed in a core i7 laptop,which has a 1.10 GHz CPU and 16.0 GB memory.As shown in Fig.2,the running time of LOF increases with the number of tested measurements,but that of the MLOF remains almost the same,which is always below 100 ms.The running time of the LOF algorithmreaches nearly 30 s when the tested measurements are collected for more than a month.The fitting curve for the running time of LOF isrt=2.061×10−7n2−4.346n+1.2247,wherertis the running time of LOF andnis the number of tested measurements.The results show that it is hard to burden the intensive computations when there are too many tested measurements with LOF.Therefore,it is more efficient to use MLOF to detect one-shot DAs.

Fig.2.The running time of LOF and MLOF algorithms.

Third,we evaluate the performance of MLOF when the input measurements are preprocessed with theγtransformation.The results are shown in Fig.3.We find that the TP rate with the preprocessed measurements is much larger than those that are not preprocessed,while the FP rate is reversed.Therefore,the input measurements should be preprocessed with theγtransformation to improve the performance of MLOF in terms of detecting one-shot DAs.The experimental results also show that the MLOF algorithm has a good transferability to detect one-shot DAs since the performance does not degrade when the malicious measurements are constructed on the other dates (not on the date 20-01-2021).

Finally,we evaluate the performance of MLOF with different transformation parameters and the numbers of nearest neighbors.The results are shown in Fig.4.We find that the TP rate is relatively larger when theγvalue is between 0.5 and 1.1 and thekvalue is between 20 and 30,while the FP rate is relatively larger whenγis beyond 1.2 and thekvalue is beyond 35.From the results,the transformation parameter and the number of nearest neighbors of MLOF should be properly set to improve its performance of detecting oneshot DAs.

Fig.3.The attack detection performance of MLOF with and without the γ transformation.The malicious measurements are constructed based on the load profiles on the dates 21-01-2021,22-01-2021,···,08-02-2021.“Transformed-TP”and “Transformed-FP”mean the TP and FP rates when the measurements are preprocessed with the γ-transformation.

Fig.4.The attack detection performance of MLOF with different γ and k values.

Acknowledgments:This work was supported in part by the Guizhou Provincial Science and Technology Projects (ZK[2022]149),the Guizhou Provincial Research Project for Universities([2022]104),the Special Foundation of Guizhou University([2021]47),the GZU cultivation project of National Natural Science Foundation of China ([2020]80),Shanghai Engineering Research Center of Big Data Management,and the National Natural Science Foundation of China (62073285,62061130220).