Cyber Security Intrusion Detection for Agriculture 4.0: Machine Learning-Based Solutions,Datasets,and Future Directions

2022-01-26MohamedAmineFerragLeiShuOthmaneFrihaandXingYang

IEEE/CAA Journal of Automatica Sinica 2022年3期

Mohamed Amine Ferrag,Lei Shu,,Othmane Friha,and Xing Yang

Abstract—In this paper,we review and analyze intrusion detection systems for Agriculture 4.0 cyber security.Specifically,we present cyber security threats and evaluation metrics used in the performance evaluation of an intrusion detection system for Agriculture 4.0.Then,we evaluate intrusion detection systems according to emerging technologies,including,Cloud computing,Fog/Edge computing,Network virtualization,Autonomous tractors,Drones,Internet of Things,Industrial agriculture,and Smart Grids.Based on the machine learning technique used,we provide a comprehensive classification of intrusion detection systems in each emerging technology.Furthermore,we present public datasets,and the implementation frameworks applied in the performance evaluation of intrusion detection systems for Agriculture 4.0.Finally,we outline challenges and future research directions in cyber security intrusion detection for Agriculture 4.0.

I.INTRODUCTION

THE agricultural and industrial revolution has evolved through the following four generations: Agriculture 1.0,Agriculture 2.0,Agriculture 3.0,and Agriculture 4.0,as depicted in Fig.1.Agriculture 1.0 refers to the practices of agriculture from the beginning of human civilization until the end of the 19th century,a period when farmers depended heavily on traditional cultivation tools such as the traditional plough for creating favorable conditions for seed placement and plant growth.At the beginning of the 20th century,the increase in agricultural production was known as Agriculture 2.0 based on the agricultural machinery includes using combines,irrigation,harvesting,trucks,tractors,aircraft,helicopters,etc.From the early seventies to the present day,Agriculture 3.0 appeared which is based on green renewable energy such as bioenergy,geothermal energy,solar energy,hydropower,and wind power [1].

Fig.1.The development of agricultural revolutions with industrial revolutions and related cyber security threats.

The term “Agriculture 4.0” appeared following “Industry 4.0” [2],[3],which is defined by a combination of technologies that are emerging such as Blockchain,software-defined networking (SDN),Artificial Intelligence,Internet of Things (IoT),IoT devices,5G communications,Drones,Fog/Edge computing,Cloud computing,network function virtualization (NFV),Smart Grids,etc.The diagram of Agriculture 4.0 is shown in Fig.2.In the physical layer,various IoT devices (e.g.,sensor and camera) and drones are applied to monitor agricultural environmental conditions by collecting meteorological data,soil moisture,crop image,and livestock behavior analysis,and health monitoring data.Different actuators (e.g.,autonomous tractors,insecticidal lamps,feeding machine,and irrigation equipment) are activated when the data meets specific conditions,which promote the automation of agriculture production and management.Besides,new energy technology (e.g.,solar power and wind power) and smart grid architecture contribute to supplying energy for IoT devices in Agriculture 4.0 [4].In the network layer,intelligent agriculture devices transmit data to the Edge/Fog node and sink node by the wireless network,which forms different types of networks,e.g.,the mesh network (ZigBee based),star network (LoRa based),GSM network (4G/5G based),and SDN (including control pane and data pane) [5].In the application layer,cloud computing is applied in the decision-making of agriculture production and management by analyzing data,which are stored in a distributed database.Edge/Fog computing is used to implement tasks with higher timeliness requirements.Moreover,the supervisory control and data acquisition (SCADA) system is usually used to monitor the operation status of electrical equipment in a Smart Grid.

Fig.2.The diagram of Agriculture 4.0,including 1) Cloud computing-enabled Agriculture 4.0,Edge/Fog computing-enabled Agriculture 4.0 in the application layer,2) SDN/NFV network-enabled Agriculture 4.0 in the network layer,and 3) Smart Grid-enabled Agriculture 4.0,autonomous tractor-enabled Agriculture 4.0,IoT-device enabled Agriculture 4.0,Drone-enabled Agriculture 4.0,industrial Agriculture 4.0 in the physical layer.

These emerging technologies have been widely applied in Industry 4.0,and it is not difficult to imitate application in agricultural scenarios.Therefore,the major challenge of developing Agriculture 4.0 does not reside in the deployment of the emerging technologies,but primarily in the guarantee of security and privacy,since the deployment of thousands of IoT-based devices is in an open field.In addition,there are many security and privacy issues associated with each layer of the IoT architecture [6].For example,an adversary can initiate many cyberattacks,such as distributed denial-of-service(DDoS) attacks to make a service unavailable and then inject false data,which affects food safety,agri-food supply chain efficiency,and agricultural productivity.The research community in cyber security proposes the use of intrusion detection systems (IDS),which is a technology for the security of networks that are dedicated to continuously observing events inside a computing or networking system,and then evaluate them against intrusion evidence [7],[8].To further protect Agriculture 4.0 from cyber attacks,the IDS can be implemented in conjunction with other security solutions including,encryption techniques,authentication,authorization,and Blockchain [9].

To detect malicious behaviors,the IDSs use artificial intelligence-based techniques,such as hybrid machine learning,voting based extreme learning machine,deep learning techniques,hierarchical approaches,reinforcement learning,etc.The IDSs based on machine learning techniques have been covered by many surveys.Table I presents the related surveys on the IDSs based on machine learning techniques.Many surveys focused on IDSs based on deep learning approaches [11],[12],[16],[19],machine learning techniques [10],[13],and support vector machines [20].Some surveys payed attention to SCADA systems [21],SDNtechnology [17],[18],and IoT networks [14],[15].In contrast,this survey proposes seven taxonomies that are related to 1) Cloud computing-enabled Agriculture 4.0,2)Fog/Edge-enabled Agriculture 4.0,3) SDN/NFV-enabled Agriculture 4.0,4) Drone-enabled Agriculture 4.0,5)Autonomous tractor-enabled Agriculture 4.0,6) IoT deviceenabled Agriculture 4.0,7) Industrial Agriculture 4.0,and 8)Smart Grid-enabled Agriculture 4.0.It also provides a more comprehensive review by covering novel security topics such as the IDS building process,public datasets,benefits of IDS,and open challenges and future research opportunities for Agriculture 4.0.Therefore,some surveys cover different aspects of Agriculture 4.0.As shown in Table II,we classify the related studies by the following factors:

TABLE IRELATED SURVEYS ON THE IDSS BASED ON MACHINE LEARNING TECHNIQUES

TABLE IIRELATED SURVEYS ON AGRICULTURE 4.0

1)IDSs:It indicates whether the survey provided a taxonomy of IDSs for Agriculture 4.0.

2) Public Datasets:It specifies if the survey presented the public datasets used in the performance evaluation of IDSs for Agriculture 4.0.

Fig.3.Organization of the content in the rest of this article.

3) Machine Learning and Deep Learning Techniques:It specifies if the survey has provided a comparative study and evaluated machine learning and deep learning approaches for cyber security intrusion detection in Agriculture 4.0.

Most of the surveys on Agriculture 4.0 outline the new technologies [1],[22]–[26],[28] such as Fog/Edge computing,software-defined networking,network function virtualization,and unmanned aerial vehicles,without focusing on the performance of machine learning and deep learning techniques for cyber security.To the best of our knowledge,our survey is the first that thoroughly covers the performance of machine learning techniques for cyber security in Agriculture 4.0.

Our survey differs from the earlier-mentioned works in the following points:

1) We present cyber security threats and evaluation metrics used in the performance evaluation of IDSs for Agriculture 4.0.

2) We provide a comprehensive classification and in-depth analysis of machine learning and deep learning based IDSs for cyber security in Agriculture 4.0.

3) We provide a detailed description of the current best practices,implementation frameworks,and public datasets used in the performance evaluation of IDSs for Agriculture 4.0.

4) We highlight remaining challenges and future research directions in cyber security intrusion detection for Agriculture 4.0.

As shown in Fig.3,the remainder of this paper is structured as follows.Section II focuses on Agriculture 4.0 cyber security threats and IDS evaluation metrics.Section III presents the IDS solutions for Agriculture 4.0.Section IV describes IDS building process and public datasets.Section V discuss the future directions.Lastly,Section VI presents the conclusions.

II.CYBER SECURITY THREATS AND EVALUATION METRICS

Although Agriculture 4.0 is envisioned to be the new standard,challenges to its acceptance and widespread adoption may be constrained by potential threats.Traditionally,some of those threats persist throughout the years,such as rough weather conditions.Although others are attributed to the broad development of technological solutions,resulting in major security gaps and serious attack vectors such as ransomware,supply chain attacks,IoT attacks,and many others [27],[29].

A.Cyber Security Threats in Agriculture 4.0

The U.S.Department of Homeland Security defined three major cyber threat categories for Precision Agriculture,namely: confidentiality-related,integrity-related,and availability-related threats [30].

1) Confidentiality-Related Threats:With a variety of communication devices in intelligent agriculture,data flows through multiple interconnected devices from source to destination.Privacy threats can lead to loss of privacy and violations of data or information [29].As farmers are highly protective of their information,for example,data on yields,farmland prices,and livestock health,it is important that this information be kept confidential.Losing or misusing this data has potentially dramatic financial,emotional and reputational consequences on farmers [30].

2) Integrity-Related Threats:Collecting and using data is an important step in helping farmers make real-time intelligent management decisions.Due to potential unauthorized or inappropriate alterations to the reliability of data or resources,it is possible that information from Intelligent Agricultural Systems can become unreliable or inaccurate,and may result in possible financial abuse [29].

3) Availability-Related Threats:Failing to provide available services to customers can cause business disturbances,and possible loss of customer confidence and earnings.For example,if an attacker stopped the activities of an existing Intelligent Agriculture Network food security would be impacted and there would be a serious loss of reputation for the equipment manufacturer [30].

B.IDS Evaluation Metrics

Several metrics can be used to evaluate the efficiency and effectiveness of an IDS,and most of them fall into one of the following categories: security-based metrics and per for mance based metrics [31],[32].

1) Security-Based Metrics:Metrics in this category describe the effectiveness of the IDS to determine the distinction between intrusive and non-intrusive activities.Being a binary classifier,an IDS can result in one of the following outputs:a)true positive (TP):when an intrusion is correctly classified as an intrusive action;b) true negative (TN):when a legitimate action is properly classified as legitimate;c) false positive(FP):when a legitimate action is erroneously classified as an intrusion; andd) false negative (FN):when an intrusion is erroneously classified as a legitimate action [31].Well-known metrics within this category include:

●Confusion matrix:This metric reflects the result of the classification.For instance,it represents the true and false results of the classification.It can have 2 ×2 dimensions in the case of binary classification,but it can also haveN×Ndimensions in the case of a multi-class classifier withNdifferent classes.Although the confounding matrix is not a metric alone,rather it is a baseline of metrics from which other indicators of effectiveness can be quantified.

●Accuracy:This metric is essentially the correct classification rate of an IDS,whether for validation set or test set.Accuracy is obtained with

●Precision:This metric represents the ratio of the classified actions by the IDS that are intrusive.Precision is obtained with

●Recall:This metric is the ratio of intrusive actions classified by the IDS as intrusive.The recall is obtained with

●Fβ-score:This metric is a weighted harmonic mean of precision and recall,where β mirrors the significance of the recall concerning accuracy.TheF-score is also applied when evaluating a multi-class classification.TheF1-score is obtained with (4).The finalF1-score is obtained by microaveraging based on class frequency (mico-F1),or by macroaveraging based on the same importance of all classes (macro-F1) [33].TheF1-score is frequently compared to the Gmeasure which is the geometric mean of the precision and recall,and it is used to evaluate binary and multi-class classifiers in cases of class imbalance [34].

●ROC curve:The receiver operator characteristic (ROC)curve is a robust metric that shows the sensitivity and specificity associated with a continuous variable.it is a plot of coordinates composed of true positive rate (TPR) (3),a vertical axis,and false positive rate (FPR) (5),a horizontal axis.The area under the ROC curve,called AUC,is regarded as a key evaluation measure.

2) Performance-Based Metrics:

i) Computational cost:the computational cost represents the amount of time required to accomplish an essential task to classify an action as intrusive or legitimate.

ii) Communication overhead:is the volume of data that can be processed by an IDS per second.This means the throughput rate expressed in Giga Bits per second to assert the performance displayed by the IDS.

iii) CPU usage:this metric represents the overhead rate on the CPU when adding an IDS to the infrastructure.

iv) Memory usage:this metric represents the memory consumption required by an IDS to do its classification.

v) Energy consumption:this metric represents the extra energy consumed by a device when an IDS is introduced.This measure is essential for hardware-limited appliances such as mobile and IoT devices.

III.IDS SOLUTIONS FOR AGRICULTURE 4.0

Agriculture 4.0 uses many emerging technologies such as SDN/NFV,Cloud computing,Fog/Edge computing,Drones,Autonomous tractors,IoT devices,Smart Grids,etc.According to these emerging technologies,we review and analyze the IDSs that use machine learning and deep learning techniques for cyber security in Agriculture 4.0.

A.IDS for Cloud Computing-Enabled Agriculture 4.0

Using various IoT devices in Agriculture 4.0,there are many new threats in the cloud environment as these devices are very susceptible to security attacks.The IDSs for Cloud computing-enabled Agriculture 4.0 can be categorized into three classes,namely,1) Game theory-based,2) Hybrid machine learning,and 3) Voting based extreme learning machine.

1) Game Theory-Based:Although game theory is not a branch of machine learning,this technique has shown very good results in IDS.Gillet al.[35] developed a model of game theory,named GTM-CSec,to provide intelligent detection of attacks in the cloud environment.The GTM-CSec model is composed of two main components: cooperative and non-cooperative games and is based on three techniques,namely,the signature,anomaly,and honeypot techniques.These techniques are adopted in the following four components: perception,logical analysis,computational analysis,and decisive analysis.The performance evaluation in MATLAB with payoff functions and probabilities showed that the GTM-CSec model can improve electricity usage of the defense mechanism and is very efficient in protecting against attackers.

2) Hybrid Machine Learning:Based on monitoring user patterns of behavior,Rabbaniet al.[36] designed a hybrid machine learning system,which is based on extracting users'behavioral patterns to detect malicious behaviors in the cloud computing environment.To construct a network that is automatically optimized,the proposed system uses particle swarm optimization-based probabilistic neural networks(PSO-PNN).The study used UNSW-NB15 dataset where the features are presented in a quantitative (i.e.,numerical) and qualitative (i.e.,symbolic) format.The experimental results reported that the PSO-PNN approach provides high accuracy to detect suspicious activities.

3) Voting Based Extreme Learning Machine:Kushwah and Ranga [37] considered a cloud infrastructure with a detector attached,which is based on three components,namely,i)training database component,ii) preprocessor component,and iii) classifier component.The detector uses a voting extreme learning machine to identify DDoS threats in a cloud computing framework,as presented in Fig.4.The study used two datasets,namely,i) the NSL-KDD dataset and ii) the ISCX dataset.The experimental results reported that the proposed system provides high accuracies of 99.18% and 92.11% with the NSL-KDD and ISCX datasets,respectively.Aldribi et al.[38] designed an IDS based on a hypervisor using online multivariate statistical change analysis for detecting anomalous cloud behavior.The study used the ISOT-CID dataset to validate cloud intrusion detection framework proposed.The experimental results reported that the proposed system’s overall detection rate was 96.23% and false-positive rate of 7.56%.

B.IDS for Fog/Edge-Enabled Agriculture 4.0

Rather than transferring data generated by IoT-connected devices to the Cloud or a Data Center,Fog/Edge Computing in Agriculture 4.0 involves processing data at the edge of the network directly where it is generated [39].The IDSs for Fog/Edge-enabled Agriculture 4.0 can be categorized into three classes,namely,1) Deep learning techniques,2) Fuzzy Gaussian mixture-based,and 3) Hybrid machine learning.

Fig.4.Voting and extreme learning machine-based IDS for Agriculture 4.0.

1) Deep Learning Techniques:To secure multiple web applications in fog computing,Tian etal.[40] proposed a distributed deep learning system using convolutional neural networks,which is a deep learning technique.Specifically,the system takes advantage of analyzing URLs,which can differentiate normal queries from those that are anomalous.The study used three datasets,namely,HTTP Dataset CSIC 2010,FWAF,and HttpParams Dataset to validate the proposed edge IDS.The experimental results reported that the accuracy fluctuates approximately 0.955.Almogren [41]proposed a malicious activity detection model for edge-ofthings computing,which is based on deep belief networks.The proposed system followed three steps,namely,i) network data collection,ii) feature extraction,and iii) classification.The network data collection consists of collecting data and dividing it into training data and test data.The feature extraction consists of extracting features related to intrusion,while the classification consists of using these features to train a deep belief network.The study used the UNSW-NB15 dataset to validate the proposed edge intrusion detection framework,in which the experimental results reported that a deep belief network has the best overall performance as compared to support vector machine and artificial neural network.Jianget al.[42] designed a system to detect and prevent identity theft attacks,named PHYAlert,for wireless edge networks.The PHYAlert system used the authenticity of the 802.11 data frame for spoofing detection (i.e.,man-in-themiddle attack).

Wuet al.[43] developed a system for intrusion detection,called SRDLM,which is based on semantic re-encoding and deep learning.The SRDLM system can be applied for Fog/Edge-based Agriculture 4.0 using three steps,namely,network traffic semantic re-encoding,a deep learning model,and multi-space projection.Ahsanet al.[44] designed an IDS based on robust adaptive multivariate hotelling’s control chart.The proposed system uses two main steps,namely,i) the data preparation step and ii) the construction of a control chart.The data preparation step consists of Data acquisition and searching.The building of a monitoring system is separated into two stages: i) Phase I: Building a Normal Profile and ii)Phase II: Detection.The KDD99,NSL-KDD,and UNSW-NB 15 datasets are used in performance evaluation.The findings indicate that the proposed chart can reach an accuracy of about 98% for the KDD99 dataset.Qureshiet al.[45]developed a system for intrusion detection based on a deep sparse auto-encoder and self-taught learning and used the NSL-KDD dataset containing 41 features and attacks categorized into the following four attack categories: Probing attacks,Remote-to-local attacks,User-to-root attack,and Denial-of-service attack.The study demonstrated that the sparse automatic encoder trained on the enhanced feature space is more robust and stable than the one trained on the original feature space.Furthermore,the study results indicated that the proposed IDS is robust and provides improved prediction accuracy.

2) Fuzzy Gaussian Mixture-Based:Despite the fact that fuzzy Gaussian mixture is not a class of machine learning,the technique has proven to be very successful in IDS.To identify zero-day attacks in Fog/Edge-based Agriculture 4.0,the FGMC-HADS method proposed by Haideret al.[46] can be applied,which are fuzzy Gaussian mixture-based correntropy models.The FGMC-HADS method uses the following steps:i) Learning the raw data from the operating system’s kernel,ii) Generating representative sequences by using the joint feature construction module,and iii) Classifying the sequences as normal or abnormal using the Gaussian mixture model.To determine the performance of the FGMC-HADS method,three Linux host datasets are used,namely,the Linux dataset of ToN_IoT [47],KDD-98 [48],and NGIDS-DS [49].The results demonstrated the superiority of the FGMC-HADS method in terms of accuracy and error compared to machinelearning approaches such as the support vector machine and knearest neighbor.Naiket al.[50] designed an intrusion identification method,named E-TLBO-FLANN,which is based on a meta-heuristic and functional link neural network and can be applied for Fog/Edge-based Agriculture 4.0.The proposed E-TLBO-FLANN method incorporates the concept of elitism for enhancing the model’s outcome.The KDDCup99 dataset is used to assess performance on detecting intrusive behavior,in which the results showed that the proposed E-TLBO-FLANN method is better than other competing techniques such as Toosi and Kahani [51] and Pfahringer [52].

The adversarial attacks can be a threat if it is used in Fog/Edge-based Agriculture 4.0,for instance,to change the classification of agricultural products or water quality.The IDS model proposed by Pawlickiet al.[53] can be applied to handling adversarial attacks against artificial neural networks.The study used an artificial neural network with 3 hidden layers and 40 neurons on the first hidden layer,40 on the second hidden layer,and 20 on the third hidden layer.The results showed that a random force achieves results with higher recall and better precision compared to artificial neural networks.For detecting unknown web attacks using a hybrid IDS,Kaur and Singh [54] designed a deep learning-based system,named D-Sign,which is based on deep recurrent neural networks.The D-Sign system is based on three parts,namely,i) signature generation engine,ii) anomaly detection engine,and iii) misuse detection engine.Both NSL-KDD and CICIDS 2017 datasets are used in evaluating the performance,in which the results demonstrated that the D-Sign system achieves 99.1% and 99.4% instances correctly for CICIDS and NSL-KDD datasets,respectively.

3) Hybrid Machine Learning:The hybrid machine learningbased IDS for Agriculture 4.0.is presented in Fig.5.Hosseini and Zade [55] proposed a hybrid IDS,named MGA-SVMHGS-PSO-ANN,that can be applied for Fog/Edge-based Agriculture 4.0.The MGA-SVM-HGS-PSO-ANN system is based on two parts,a part to select features and a part to detect attacks.The feature selection part consists of combining features of a genetic algorithm and support vector machine with multi-parent crossover and multi-parent mutation.The attack detection part uses an artificial neural network combined with a particle swarm optimization and a hybrid gravitational search.The performance evaluation on the NSLKDD dataset showed that the MGA-SVM-HGS-PSO-ANN system achieves a detection accuracy of 99.3%.Tenget al.[56] proposed a 2-class SVM and decision trees based collaborative and adaptive IDS,which can be applied in the cloud (global agent with a big group) and edge (local agent with a small group).

C.IDS for SDN/NFV-Enabled Agriculture 4.0

Both SDN and NFV technologies in Agriculture 4.0 offer a software-based management platform for controlling standard network hardware such as Big Switch Networks or Vmware NSX.With the diversity of network attacks against Agriculture 4.0,the SDN/NFV-based Agriculture 4.0 faces many security issues.The IDS for SDN/NFV-enabled Agriculture 4.0 can be categorized into five classes,namely,1) Signature and anomaly-based,2) Multilayered intrusion detection,3) Hybrid machine learning,4) Parallelized intrusion detection,and 5) Deep learning techniques.

Fig.5.Hybrid machine learning-based IDS for Agriculture 4.0.

1) Signature and Anomaly-Based:Although anomaly-based detection has surpassed signature-based detection,in some situations their integration has been very successful in IDSs.Ngoet al.[57] conceptualized an SDN-based architecture for secure forwarding devices using signature and anomaly-based IDSs,which can be applied for Agriculture 4.0,as presented in Fig.6.Specifically,the proposed architecture integrated two intrusion detection engines,called F-NIDS and F-ANIDS.The F-NIDS engine used snort rules for classifying attack packets,while the F-ANIDS engine used machine learning algorithms.The performance evaluation of both intrusion detection engines is implemented on parallel hardware platforms with two NetFPGA-10G boards (Xilinx xc5vtx240t FPGA device) and one GPU (GTX Geforce 1080 G1).The results showed that both intrusion detection engines are 14×faster using the GPU compared to using the CPU.It is important to note that signature-based detection has two major drawbacks [58],first,it is difficult to identify events that lead to an actual intrusion into network systems due to the massive amount of data logs,and second,the signature database grows exponentially over time.Ahmadonet al.[58] suggested a methodology based on Petri net-based [59] to detect intrusion behaviors,while reducing the number of alerts,along with an update method fusing two or more models of similar intrusion behaviors.The experiments performed showed the effectiveness of these methods.

对聚类出的5种类型交通模式曲线的变化趋势进行分析后，可对不同分类结果下街道的运行特征进行描述和分析并给出标准曲线.

Fig.6.Signature and anomaly-based IDS for SDN/NFV-enabled Agriculture 4.0.

2) Multilayered Intrusion Detection:Abdulqadderet al.[60] developed a multi-layered intrusion detection and prevention system,named MLP-IDP,for SDN/NFV enabled cloud of 5G networks.Specifically,the MLP-IDP proposed system is organized on five layers,namely,a switches layer,smart controller layer,domain controllers layer,data acquisition layer,and virtualization layer.In the data acquisition layer,the MLP-IDP system uses the Four-Q-Curve algorithm for authenticating mobile users with a trusted third party.To observe every switch in the data plan,the MLP-IDP system adopts an intrusion detection and prevention system which uses a deep reinforcement learning algorithm.In the domain controllers layer,the MLP-IDP system uses the Shannon Entropy function to classify the packets into normal or suspicious classes.In a smart controller layer,the MLPIDP system uses multiple self-organizing maps to detects a DDoS attack.Compared to Abdulqadderet al.’s scheme [61],the MLP-IDP system is efficient in terms of security between switches and controllers.For visualizing network intrusion detection data,Zonget al.[62] designed an interactive method,which can improve the comprehension of network intrusion detection data in SDN/NFV-based Agriculture 4.0 through the use of a graphic display to show the relationship among the different categories of network traffic.Mishraet al.[63] provided a defensive mechanism against DDoS attempts which is based on entropy variations between a DDoS attack and regular traffic under low false positive rates and with slight processing overhead.All the simulations were performed inside a Mininet emulator using the POX controller.The proposed mechanism resulted in a detection rate of 98.2% and 0.04% false-positive rate.

Fig.7.Combination of blockchain and machine learning for cyber security intrusion detection for Agriculture 4.0.

3) Hybrid Machine Learning:Derhabet al.[64] proposed an intrusion detection framework,named RSL-KNN,which integrates the Blockchain and the software-defined network technologies.The RSL-KNN framework can be applied for SDN/NFV-based Agriculture 4.0.To defend against the forged command,the RSL-KNN framework uses two machine learning algorithms,namely,Random Subspace Learning and K-Nearest Neighbor.To prevent the misrouting attack in SDN/NFV-based Agriculture 4.0,the proposed framework uses a Blockchain-based integrity checking system.The performance evaluation on the Industrial Control System Cyber attack Dataset showed that the RSL-KNN framework system achieves detection accuracies of 91.07% and 96.73%under multi-class and binary class classification,respectively.The combination of blockchain and machine learning-based IDSs for Agriculture 4.0 cyber security is shown in Fig.7.Based on feature selection and ensemble classifier techniques,Zhouet al.[65] introduced an IDS,which can be applied for SDN/NFV-based Agriculture 4.0.The heuristic algorithm is used for dimensionality reduction.Both C4.5 and Random Forest are used as classifier techniques for attack recognition.The NSL-KDD,AWID,and CIC-IDS2017 datasets are used in the experimental phase with Weka 3.8.3,where the results showed that the proposed system achieves detection accuracies of 98.3% and 99.3% for C4.5 and Random Forest classifiers,respectively.Lvet al.[66] used an extreme learning machine with a hybrid kernel function to build an intrusion detection approach,named KPCA-DEGSAHKELM,which can be applied for SDN/NFV-based Agriculture 4.0.To detect attacks,the KPCA-DEGSAHKELM system,a hybrid algorithm that combines the gravitational search algorithm and differential evolution algorithm,is used.The evaluation of performance using three datasets,namely,the industrial intrusion detection dataset,UNSW-NB15 dataset,and KDD99 dataset,shows that the KPCA-DEGSA-HKELM system can achieve higher computational efficiency with savings of 82.21%.

To improve the rate of precision in intrusion activities,Velliangiri and Karthikeyan [67] designed a hybrid optimization scheme based on adaptive particle swarm optimization and adaptive artificial bee colony optimization techniques.The hybrid optimization scheme uses four stages:i) Choice of the dataset,ii) Preprocessing of information,iii)Choice of feature,and iv) Hybrid categorization.The performance evaluation on the NSL-KDD dataset shows that the precision of the hybrid optimization scheme is increased to 94.23% and 97.85% compared to naive bayes and the support vector machine.

4) Parallelized Intrusion Detection:To ensure real-time anomaly detection,Chellammal and Malarchelvi [68] designed a parallelized intrusion detection architecture,which can be applied for SDN/NFV-based Agriculture 4.0.The proposed architecture is composed of five major components,namely,a model retraining component,prediction aggregator,ensemblebased prediction model,data partitioning component,and feature reducer.The performance evaluation on the three following datasets: KDD CUP 99,NSL-KDD,and Koyoto 2006 datasets,shows that the proposed architecture achieves an anomaly detection rate between 98% to 99%.

To improve the performance of classification and minimize calculation times,Khammassi and Krichen [69] proposed an NSGA2-LR wrapper approach,which can be applied for SDN/NFV-based Agriculture 4.0.The proposed wrapper approach is evaluated under two different frameworks: the first one employs a binomial logistic regression with numerous binary-class datasets for every attack type,and the second one employs a multinomial logistic regression with a multi-class dataset.The obtained results during the performance evaluation on the NSL-KDD dataset,UNSWNB15 dataset,and CIC-IDS2017 dataset,showed a higher accuracy while using binary class datasets compared to multiclass datasets.

5) Deep Learning Techniques:The deep learning-based IDS for Agriculture 4.0.is presented in Fig.8.Based on the combination of a conventional learning classifier system with a convolutional neural network,Bu and Cho [70] proposed a convolutional neural-based learning classifier system for IDSs,which can be applied for SDN/NFV-based Agriculture 4.0.To increase the detection rate of unfamiliar attacks,Yanget al.[71] designed a network intrusion detection model,called SAVAER-DNN,which can detect unknown and known attacks.The SAVAER-DNN model integrates the supervised variational auto-encoder data generation (SAVAER) and the wasserstein generative adversarial network with gradient penalty adversarial learning for providing a one-hot class vector to the discriminator network.To synthesize lowfrequent and unknown attacks,the SAVAER’s decode is used.

Due to the complexity of a SDN/NFV-based Agriculture 4.0 environment,the intrusion network samples are overwhelmed by a large number of normal samples.To resolve this issue,the work by Jianget al.[72] can be adopted,in which the authors introduced a network intrusion detection algorithm.The proposed work is combined with a deep hierarchical network where one-side selection is used to reduce the noise of samples in the majority category.The performance evaluation showed that the proposed work can achieve an accuracy of 83.58% and 77.16% on the NSL-KDD and UNSW-NB15 dataset,respectively.To combat malware spread in IoT-based networks,Guizani and Ghafoor [73]designed a software-based architecture for network function virtualization.For predicting malware attacks,the proposed architecture used a machine-learning recurrent neural network long short term memory (RNN-LSTM) model.The BoT-IoT Dataset is used on experimentations,in which the tests showed the number of hosts decreased as a result of the RNNLSTM identification of the malware attack.

Fig.8.Deep learning-based IDS for Agriculture 4.0.

D.IDS for Drones-Enabled Agriculture 4.0

Due to the integration of 5G systems in the emerging smart city idea,the internet of drones (IoD) has emerged as a new research field of “drone-to-drone communication (D2D)” for the Agriculture 4.0 [74].The use of several UAVs (i.e.,UAV swarm) collaborating to reach a specific goal in agriculture has increased productivity and reduced operational efforts[75],[76].Nevertheless,these systems are vulnerable to cyber threats where an attacker can exploit them by causing significant damage such as taking control over them,disrupting operations,or stealing carried shipments [77].Therefore,ensuring system security is becoming more and more crucial,especially in dynamic and decentralized droneto-drone networks.Hence,finding an IDS for Agriculture 4.0 is still highly desirable.The IDSs for Drones-enabled Agriculture 4.0 can be categorized into two classes,namely,1) Deep learning techniques and 2) Hierarchical approaches.

1) Deep Learning Techniques:Huang and Lei [78] designed an imbalanced generative adversarial network-based IDS,named IGAN-IDS,for ad-hoc networks,which can be applied for drone-based Agriculture 4.0.Specifically,the IGAN-IDS is composed of three different units,namely,a deep neural network,an imbalanced generative adversarial network,and feature extraction component.To convert raw network properties into feature vector values,the IGAN-IDS adopts a feed-forward neural network,while the imbalanced generative adversarial network is used to generate new samples expressed in the latent space.The deep neural network is used for classifying attacks against drones-based Agriculture 4.0 using convolutional and fully-connected layers.Three datasets,namely,the CICIDS2017 dataset,UNSW-NB15 dataset,and NSL-KDD dataset,are used on the performance evaluation.The results showed that the multilayer perceptron achieved an accuracy of 78.32% on UNSW-NB15 and 78.97% on NSL-KDD.

To select both feature subset and hyperparameters in one process,Elmasryet al.[82] introduced a double particle swarm optimization-based algorithm,which can be applied for drone-based Agriculture 4.0.In order to investigate performance differences,the authors used three deep learning models,namely,Deep Belief Networks,Long Short-Term Memory Recurrent Neural Networks,and deep neural networks (DBN).Two common IDS datasets,namely,NSLKDD and CICIDS2017,are used in evaluation performance where the results showed that the proposed algorithm can reduce false alarm rate 1% to 5% and increase detection rate by 4% to 6%.

2) Hierarchical Approaches:Al Qurashiet al.[83]proposed an IDS architectural approach for resilient intrusion detection in ad-hoc networks,which can be applied for dronebased Agriculture 4.0.Abhisheket al.[84] designed a hybrid IDS in clustered IoT networks,which is deployed at the trusted node.The proposed system can be applied for dronebased Agriculture 4.0,where an Agriculture 4.0 network consists of one access point and a set of drone devices.To categorize the relay as either malicious or non-malicious,the proposed system performs a binary hypothesis test with the two hypotheses,namely,where i) the Relay is affected and compromising the packets and ii) the Relay is not affected and is in normal operation.Besides,based on the number of unicast packets dropped by the IoT devices,the proposed system can identify any adversary that affects the downlink unicast packet.By exchanging and sharing data,collaborative IDSs can improve the performance of a single detector in drone-based Agriculture 4.0.Based on the deep learning approach,Villamizaret al.[85] designed an accurate people detection approach,named WatchNet++,for detecting attacks in video surveillance,which can be applied for drone-based Agriculture 4.0.Liet al.[86] investigated the use of disagreement-based semi-supervised learning in collaborative IDSs.The performance evaluation on the DARPA dataset and a real dataset showed that the proposed system could outperform traditional supervised learning in terms of detection rate and as well as false alarm reduction.

E.IDS for Autonomous Tractors-Enabled Agriculture 4.0

In the last few decades,the implementation of autonomous tractors in Agriculture 4.0 has experienced a rapid growth[87].

Given the fact that autonomous vehicles operate on large interconnected networks,there is an increased risk of security and privacy measures [88].However,the use of IDSs can minimize or mitigate such risks.The IDSs for Autonomous tractors-enabled Agriculture 4.0 can be categorized into four classes,namely,1) Deep learning techniques,2) Data analytics and statistical techniques,3) Flow-based intrusion detection,and 4) Distributed collaborative intrusion detection.IDS for Autonomous tractors in Agriculture 4.0 is presented in Fig.9.

1) Deep Learning Techniques:Based on a deep convolutional neural network,Songet al.[89] developed a system for intrusion detection for in-vehicle networks,which can be applied for autonomous tractors-based Agriculture 4.0 against cyber-attacks,such as denial-of-service and spoofing attacks.van Wyket al.[90] combined a convolutional neural network with a well-established method of detecting anomalies,and Kalman filtering with an X2-detector for anomaly detection and identification in automated vehicles.van Wyket al.[90] combined a convolutional neural network,named CNN-KF,with a well-established anomaly detection method,and Kalman filtering with an X2-detector for anomaly detection and identification in automated vehicles.The input to the convolutional neural network is a series of“images” from the connected and automated vehicles,and then classify these images as anomalous or normal.The numerical experiments demonstrated that the CNN-KF can detect anomalies and identify their sources with a high F1 score,sensitivity,and accuracy.

Fig.9.IDS for autonomous tractors-enabled Agriculture 4.0.

2) Data Analytics and Statistical Techniques:To extract more optimized and strongly correlated features,Ieracitanoet al.[91] proposed a statistical analysis and autoencoder driven intelligent IDS,which is based on the combination of data analytics and statistical techniques.The proposed system can be applied for autonomous tractors-based Agriculture 4.0.The performance evaluation using the benchmark NSL-KDD dataset (i.e.,binary-classification (Normal,Abnormal) and multi-classification (Normal,Dos,R2L,Probe)) showed that the proposed system is efficient in terms of precision,recall,F1 score,and accuracy.Through payload analysis of network traffic,Vidalet al.[92] proposed an enhanced payload analyzer,named EsPADA,which takes advantage of the Bloom filtering data structure paradigm and the N-gram methodology.During the training stage,according to features extracted by N-gram,both normal and adversarial models are constructed.Both DARPA’99 and UCM 2011 datasets are used in the performance evaluation,where the results showed that the EsPADA can resist the disguised attacks with a lower accuracy reduction (4.86%).

To optimize processes of cyber security in large distributed systems,Vieiraet al.[93] designed an autonomic intrusion detection and response system,which can be applied for autonomous tractor-based Agriculture 4.0.To evaluate the proposed system,the authors used a proof-of-concept implementation in two scenarios: i) virtual machines running on a private cloud,and ii) virtual machines running on Amazon public cloud.The results showed that the proposed system is effective within acceptable timeframes for both private and public cloud experiments.Therefore,to detect data integrity attacks autonomous tractors-based Agriculture 4.0,the work by Benisha and Ratna [94] can be applied,where the authors designed an IDS,named DI-EIDS,which is based on three techniques,including,the Deviation forest,Grey Wolf Optimization,and Black forest classifier.To obtain the best training data,the DI-EIDS adopts a Black forest classifier,while Grey Wolf Optimization is implemented for sampling ratio optimization.The UNSW-NB15 dataset is used on evaluation performance,where the results showed that the DIEIDS can achieve higher performance in accuracy and FAR.

3) Flow-Based Intrusion Detection:To improve the detection rate of minority classes,Zhanget al.[95] designed a flow-based intrusion detection model,named SGM-CNN,which uses a combination of synthetic minority over-sampling technique (SMOTE) and under-sampling for clustering based on Gaussian Mixture Model.Both UNSW-NB15 and CICIDS2017 datasets are used in the performance evaluation,where the experimental results showed that the SGM-CNN model can achieve detection rates of 99.74% and 96.54% on the UNSW-NB15 dataset using binary classification and multiclass classification,respectively.In addition,the SGMCNN model can achieve a detection rate of 99.85% on the CICIDS2017 dataset for 15-class classification.Therefore,Liuet al.[96] proposed a web IDS based on the combination of feature analysis and support vector machine optimization to find a kernel function,which can be applied for autonomous tractor-based Agriculture 4.0.The HTTP DATASET CSIC 2010 dataset is used in the experiments,in which the results showed that the proposed system has better detection capability.

4) Distributed Collaborative Intrusion Detection:The characteristics of high mobility and rapid topology change of autonomous tractors in Agriculture 4.0 makes it vulnerable to various malicious attacks.To enable data collection,analysis,and tracking between vehicles,Zhouet al.[97] proposed a distributed collaborative IDS,named DCDIV.To ensure a reliable and stable communication link,the DCDIV system adopts a reputation-based cooperative communication method.The Venis tool is used as an experimental scenario for simulating the traffic and network environment of vehicle communications.The results showed that the DCDIV system can achieve a faster attack detection rate,lower false alarm rate,and higher detection rate.Therefore,based on random forest feature selection,Liet al.[98] designed an autoencoder IDS,named AE-IDS,which can be applied for autonomous tractors-based Agriculture 4.0.The AE-IDS architecture is made up of four main connected modules,namely: i) Data Preprocessing,which consists of dividing the dataset into a training set and a test set by proportion,ii)Feature Selection,which consists of selecting the most significant feature using the random forest algorithm,iii)Feature Grouping,which consists of applying the affinity propagation clustering algorithm to group the average features and get various feature subsets,and iv) Anomaly Detection,which consists of the combination of auto-encoder and Kmeans.The CSE-CIC-IDS2018 dataset is used as an experimental environment,in which the results showed that the AE-IDS can perform better than some popular batch/offline methods.

Fig.10.Flow diagram for deciding how to choose the type of hidden layer activation function and output activation function for a deep learning-based IDS.

F.IDS for IoT Devices-Enabled Agriculture 4.0

The IDSs for IoT devices-enabled Agriculture 4.0 can be categorized into four classes,namely,1) Hierarchical approaches,2) Deep learning techniques,3) Hybrid machine learning,and 4) Artificial bee colony (ABC) model-based.

1) Hierarchical Approaches:To secure IoT devices and detecting cyberattacks in the IoT environment,Ferraget al.[99] developed a system for intrusion detection,named RDTIDS,which is based on rules and decision tree algorithms.The RDTIDS uses two steps: training and testing.The training step consists of training the three classifiers under a hierarchical model,while the testing step consists of the classification of data as either benign or as a specific type of attack.Therefore,the RDTIDS is located in the fog computing layer in a three-tier fog computing architecture.Both the CICIDS 2017 dataset and the Bot-IoT dataset are used for the experiments,in which the results showed that the RDTIDS can achieve the highest true negative rate with 98.855%.

2) Deep Learning Techniques:The activation functions in the hidden layer as well as in the output layer are an essential feature of the design of a Deep learning-based IDS.Specifically,the selection of the activation function in the hidden layer determines the ability of the network model to learn the training data set,while in the output layer,it determines the specific type of predictions that the model can perform [100].Fig.10 presents a procedure that could be applied as an initial check for deciding how to choose the type of hidden layer activation function as well as an output activation function for a deep learning-based IDS.Almianiet al.[101] introduced an artificially full-automated IDS for IoT devices,which can be applied for IoT devices-based agriculture.To catch specific types of attacks for IoT environments,the proposed system uses an architecture of cascaded filtering stages and adopts the deep multi-layered recursive neural networks.The proposed training algorithm uses the following stages: Feedforward computation,Backpropagation to the output layer,Backpropagation to the hidden layer,and Weights update.The NSL-KDD dataset is used as a training and testing benchmark dataset,in which the results showed detection rates of 98.27%,97.35%,64.93%,and 77.25%,for DoS attacks,Probe attacks,R2L attacks,and U2R attacks,respectively.Liet al.[102] focused on the use of a convolutional neural network to overcome the problems of image security detection and adapt it to the present diversity of media types,and to build a high-speed,high-precision imaging type recognition system.The proposed scheme results in more than 93.75% classification accuracy across the general image library.

3) Hybrid Machine Learning:To perform feature clustering for efficient detection of intrusions,Aljawarneh and Vangipuram [103] introduced a Gaussian dissimilarity measure,named GARUDA,for anomaly detection in IoT networks.The performance evaluation is conducted on both KDD and NSL-KDD datasets,in which the results showed detection accuracies of 85.82%,19.23%,94.83%,99.07%,and 98.38% for DoS,Probe,R2L,U2R,and Normal classes,respectively.Based on long short term memory recurrent neural networks,Jianget al.[104] proposed a multi-channel intelligent attack detection system.The proposed system uses the following steps: Data preprocessing step,Multi-channel processing step,and Voting step.The Data preprocessing step consists of providing high-quality data and extracting different types of features from the processed data.The Multi-channel processing step consists of generating classifiers by training neural networks.The Voting step consists of deciding whether the input data is an attack or not.The NSL-KDD dataset is used to train and test the multi-channel intelligent attack detection system,in which the results showed a higher accuracy of 98.94%.

Botnets are the most serious threats to cyber Agriculture 4.0,where,a large number of hosts are remotely controlled by attackers with zombie programs.To detect botnets,Wanget al.[105] proposed an automated botnet detection system,named BotMark,which can be applied in complex environments such as Agriculture 4.0.The BotMark system does not require any previous knowledge of botnets and utilizes a hybrid analysis of flow-based and graph-based traffic patterns.To characterize the behaviors of botnets,the BotMark system extracts from network traffic 15 flow-based features and 3 types of graph-based features.The experimental results showed that the BotMark system can reach a detection accuracy of 99.94%.Shafiqet al.[106]presented a feature selection metric approach called CorrAUC,together with a feature selection algorithm called Corrauc,that can be used in IDSs for securing IoT networks.The authors assessed the proposed approach by utilizing the Bot-IoT dataset and four separate ML algorithms.The results showed that the suggested method can achieve results greater than 96% on average.Al Shormanet al.[107] proposed an unsupervised evolutionary IoT botnet detection method,named GWO-OCSVM,based on the grey wolf optimization algorithm (GWO) and one-class support vector machine(OCSVM).The GWO algorithm is used to optimize the hyperparameters of the OCSVM.The experimental results showed that the GWO-OCSVM system with the NN-BaIoT dataset showed good performance in terms of the true positive rate and false-positive rate.To efficiently detect network intrusions,Hassanet al.[108] proposed a hybrid deep learning model using a long short-term memory network and a convolutional neural network.The UNSW-NB15 dataset is used as a publicly available big dataset for the performance evaluation of the hybrid deep learning model,which shows good performance,achieving 97.1% accuracy compared to traditional approaches.

4) Artificial Bee Colony (ABC) Model-Based:The Sybil attack can create serious threats to cyber Agriculture 4.0,where an adversary asserts various illegal identities by constructing or damaging the IoT nodes.To detect the Sybil attack in the IoT environment,Murali and Jamalipour [109]proposed a lightweight IDS based on the ABC model.The ABL model is used as an optimization technique for simulates the foraging behavior of honey bees.The simulation results showed that the average accuracy rate of the proposed IDS is 96.8%,95.2%,and 94.8%,for type 1,type 2,type 3 attack,respectively.The type 1 attack consists of malicious nodes that will target one fixed region.The type 2 attack consists of malicious nodes that are scattered among the legitimate nodes,while the type 3 attack consists of Sybil nodes under mobility and is distributed among the network.To address intrusion detection in supervised problems,the work by Lopez-Martinet al.[110] can be applied,in which the authors use reinforcement learning to network intrusion detection using two datasets,namely NSL-KDD and AWID datasets.The study evaluated the IDS model regarding the use of the following four deep reinforcement learning methods: actorcritic (AC),policy gradient (PG),double deep Q-network(DDQN),and deep Q-network (DQN).The DDQN algorithm showed good results compared to other deep reinforcement learning algorithms.

G.IDS for Industrial Agriculture 4.0

Critical infrastructure is essential for the good governance of society.This is why the exposure of such infrastructures to extreme events has a significant impact on the resilience of society [111].The industrial Agriculture 4.0 is complex and diverse which shows a low real-time performance for impersonation attacks.The IDSs for Industrial Agriculture 4.0 can be categorized into two classes,namely,1) Hierarchical approaches and 2) Hybrid machine learning.

1) Hybrid Machine Learning:Lianget al.[112] proposed an industrial network IDS that can be applied for Agriculture 4.0.The proposed system uses a multi-feature data clustering optimization model to diagnose,restore,and rebuild.The performance evaluation using both NSL-KDD and KDDCU’99 datasets showed that the proposed system can reach an accuracy of 97.8%.Therefore,for secure smart factory-based ambient intelligence,Parket al.[113] introduced a machine learning and context-aware IDS.The proposed system consists of three phases,namely,i) Data capture and parsing phase,ii) Model building and inference phase,and iii) Threat visualization phase.For reconstruction and compensation of cyber attacks launched on industrial IoT systems,Farivaret al.[114] designed a hybrid intelligent-classic control approach.As an intelligent estimator for attack estimation,the proposed approach uses a neural network.To ensure system stability during attacks,the proposed approach adopts the nonlinear control theory.

2) Hierarchical Approaches:Industrial cyber-physical systems (ICPS),combining advanced communication,computing,and industrial process monitoring,are considered a core technology for Industry 4.0.Liuet al.[115] introduced a hierarchically distributed intrusion detection scheme to achieve all-round security protection of ICPSs.An adversary can launch a stealthy attack to manipulate the sensor readings secretly based on the knowledge of the physical model used by traditional IDS.To detect stealthy attacks on an industrial control system,Huet al.[116] designed an intrusion detection approach using permutation entropy.Therefore,to study the proposal’s efficiency in detecting stealthy attacks,a Matlab-Simulink environment is used,in which the results showed a good detection ability of the proposed permutation entropybased intrusion detection against stealthy attacks on industrial control systems.To estimate the attack signal waveforms in an industrial control system,Miaoet al.[117] proposed the following two types of estimators: nonlinear attack signal estimator (NASE) and linear attack signal estimator (LASE).However,the IDSs proposed in the industrial control systems can be classified into system model-based systems and traditional information technology-based systems.

H.IDS for Smart Grid-Enabled Agriculture 4.0

The smart grid-based Agriculture 4.0 is composed of a set of controllers,automation,and standard communication protocols,where they are interconnected over the Internet to control the production and distribution of energy to IoT devices in smart agriculture [21].The IDSs for Smart Gridenabled Agriculture 4.0 can be categorized into two classes,namely,1) Reinforcement learning and 2) Collaborative intrusion detection.

1) Reinforcement Learning:Kurtet al.[118] designed a reinforcement learning approach for online cyber-attack detection,which can be applied for smart grid-based Agriculture 4.0.The proposed approach uses a direct mapping from observations to actions using two phases,namely,the training phase and the test phase.The training phase consists of training the defender with low magnitude attacks,while the test phase consists of detecting slight deviations of meter measurements.The IEEE-14 bus power system is used as a simulation environment,in which the results showed a high potential of reinforcement learning approaches in solving complex cyber security problems such as smart grid-based Agriculture 4.0.

2) Collaborative Intrusion Detection:To provide the best possible protection of Smart Grid ecosystems,Patelet al.[119] proposed a collaborative IDS,named IDPS,which can identify an attack based on three structural forms,namely,Centralized,Hierarchical,and Fully distributed.The IDPS uses three advanced components: a fuzzy logic risk manager,knowledge manager,and an autonomic manager.To classify the binary-class,triple-class,and multi-class cyber-attacks in the smart grid,Haghnegahdar and Wang [120] proposed an instruction detection system using a whale optimization algorithm and an artificial neural network.The proposed system uses the artificial neural network to achieve the minimum mean square error,while the Whale optimization algorithm is applied to initialize and adjust the weight vector.

Tables III and IV present a summary of IDSs for Agriculture 4.0.The classification of IDS solutions for Agriculture 4.0 is presented in Fig.11.

IV.IDS BUILDING PROCESS AND PUBLIC DATASETS

The food industry has experienced a shift from disconnected,stand-alone,independent operations to heavily interconnected,dependent,and integrated operations,to improve the sector’s efficiency [87].As a result,network organizations find themselves in a highly efficient production system,with growing complexity,and increased exposure to potential risks.Connectivity in the agri-food chain involves the control of information assets,the transport of physical goods and services,and other intangible assets,as shown in Fig.12.Increasingly,this control has become ubiquitous and pervasive throughout Agriculture 4.0,making it even more difficult to secure all of the sector’s resources.In this section,we provide the IDS building process for Agriculture 4.0.

A.Agriculture 4.0 Data Sources

Embedding emerging technologies provides smarter management of agri-food supply chains,as it can combine diverse patterns of independent data analysis,historical data repositories,and real-time data traffic [127].Both real-time data and automated data processing tools offer new ways to respond more quickly to changing conditions in Agriculture 4.0.The activities associated with each agricultural component are automatically integrated into the food chain through emerging technologies,from farm to fork,as shown in Fig.12.Some of these components have separate data sources that need to be used.These data not only must be present,but they must also work in balance across all systems.Some of the core components in Agriculture 4.0 are:

1) Smart Farming Systems:These systems are designed by integrating advanced technologies into existing farming operations,such as intelligent crop/livestock monitoring,smart water management,disease management,smart harvesting,etc.,to improve the quality and efficiency of agricultural production.It includes different types of sensors,actuators,unmanned aerial/ground vehicles,smart agricultural machinery,and so on,focusing on linking objects in the IoTbased smart farm.While monitoring,performing agricultural tasks,and processing farm-related data,via deployed intelligent devices.

2) Transportation Services:These services are in charge of the flow of agricultural products from one location to another,starting at the beginning of the supply chain and ending at the customer’s kitchen.It includes different types of smart sensors,GPS kits,Internet of vehicles (IoV) communications,where vehicles communicate among themselves and with public networks through vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) interfaces.They permit the real-time collection and sharing of sensitive information on road conditions and agricultural payloads status.

3) Storage Entities:These entities are in charge with storage operations.Cold storage systems are equipped with monitoring systems that can track changes over time in thestatus of the stored agricultural products,warning and alerting management as soon as something seems to be wrong.It comprises several types of smart sensors,such as temperature and humidity sensors.

TABLE IIISUMMARY OF IDSS FOR AGRICULTURE 4.0

TABLE IVSUMMARY OF IDSS FOR AGRICULTURE 4.0 (CONTINUED)

4) Food Processors:These systems both prepare fresh food for the market as well as manufacture prepared agricultural products.It is composed of a relatively large and diverse group of companies that make a product.They also use agricultural raw materials or sub-assemblies manufactured by different producers to develop their products.Using IoTenabled equipment,it is possible to manage a wide range of quality control operations.For example,manufacturers can monitor production volumes and temperatures for different commodities,pressure conditions,and labeling products.

5) Distributors:These services would usually consist of an entity that acquires large inventories of products that it purchases from producers and sells to consumers.Distributors fulfill the “Time and Place” requirement for the customer by delivering the products whenever and wherever the customer desires.

6) Retailers:These entities hold stocks with smaller quantities for sale to the public.They also keep track of the customers’ preferences and demands.

B.Public Datasets

In this sub-section,we examine 12 selected datasets made available since 1999.Table V present a summary of public datasets and selected IDSs,that can be used in Agriculture 4.0.

Fig.11.Classification of IDS Solutions for Agriculture 4.0.

1) KDD99 Dataset:This dataset was the most widely used dataset for the evaluation of IDSs [48].This dataset is created by Stolfoet al.[142] using data collected in the DARPA’98 IDS evaluation program [143],which is approximately 4 gigabytes of raw Tcpdump data packed from 7 weeks of network traffic,that can be managed in about 5 million connection records [129].KDD99 consists of approximately 4 900 000 vectors with 23 types of attacks divided into four major attack categories,namely,remote to local (R2L),probing (PRB),denial of service (DoS),and user to root(U2R).

2) UNSW-NB15 Dataset:This dataset created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) [128],in order to provide a hybrid of real modern normal activities and synthetic attack patterns.The Tcpdump tool was selected to capture 100 GB of raw traffic.It contains approximately 2 540 044 data instances with 49 features and 9 types of attack in total,including,Worms,Reconnaissance,Shellcode,Generic,Exploits,and DoS.

3) NSL-KDD Dataset:This dataset was suggested by [129],to resolve certain problems inherent in the KDD’99 dataset,such as the lack of a precise definition of the attacks.It contains two datasets KDDTrain+ (125 973 records) and KDDTest+ (22 544 records),which are generated from the KDD’99 data set.There are four major attack categories,namely,R2L,PRB,DoS,and U2R.

4) ISCX Dataset:This dataset was created by the Information Security Centre of Excellence (ISCX) [130],consists of 7 days of network activity,and is based on four types of attacks,namely,Distributed Denial of Service,Brute Force SSH,HTTP Denial of Service,and Infiltrating the network.

5) ISOT Cloud Intrusion Dataset:This dataset consists of an aggregation of more than 8 terabytes collected synchronously from an OpenStack cloud production environment [131].The dataset contains web vulnerabilities scanning,dictionary/brute force login,directory/path traversal,cross-site scripting,SQL injection,fuzzers,and HTTP flood DOS attack types.

6) HTTP Dataset CSIC:This dataset was created by the Information Security Institute of CSIC (Spanish Research National Council) [132].It holds traffic that was generated for an e-commerce web application,and contains more than 36 000 normal requests and 25 000 anomalous requests with some types of attacks,such as parameter tampering,CRLF injection,XSS (Cross-site scripting),SQL injection,etc.

7) Kyoto 2006 Dataset:The raw traffic data was collected by honeypot systems deployed at Kyoto University [133].It contains 14 features were extracted based on KDD Cup 99 data set and 10 additional features.

8) CICIDS 2017 Dataset:This dataset was created by the Canadian Institute for cyber security [134].It has 80 features associated with every Netflow record written in CSV format,which facilitates its import into a machine learning package.It contains the most common attacks based on the 2016 McAfee report,such as Scan,Web based,Brute force,DDoS,Heartbleed,Infiltration,and Bot.

9) Industrial Control System Cyber Attack Dataset:This dataset contains five datasets,including,i) Power System Datasets,ii) Gas Pipeline Datasets,iii) Gas Pipeline and Water Storage Tank,iv) New Gas Pipeline,and v) Energy Management System Data [135].

10) AWID Dataset:For data collection,[136] created an actual laboratory that realistically reproduces a typical SOHO infrastructure.Several mobile and stationary STAs were used as valid clients of the network,while a unique mobile attacker triggered various attacks.It contains a vector of 155 attributes,and each attribute has numeric or nominal values.There are three types of attacks,including,Injection,Flooding,and Impersonation.

Fig.12.IDS-secured smart supply chain in Agriculture 4.0.

11) CSE-CIC-IDS2018 Dataset:This dataset was created by the Canadian Institute for cyber security [134].It contains a collection of web application attacks,brute force attacks,and last updated attacks,such as DDoS+PortScan,Botnet attacks,Infiltration attacks,Web attacks,DoS attacks,and Bruteforce attacks.

12) Bot-IoT Dataset:This dataset was generated by creating a realistic network environment in the Cyber Range Lab at the UNSW Canberra Cyber [137].It contains more than 72.000.000 records with the following attacks,OS and Service Scan,DDoS,DoS,Data exfiltration attacks,and Keylogging.

13) TON_IOT Dataset:For the purpose of collecting and analysing heterogeneous data sources from the IoT and industrial IoT (IIoT),the IoT Lab of the UNSW Canberra Cyber,the School of Engineering and Information Technology (SEIT),UNSW Canberra at the Australian Defence Force Academy (ADFA) created a dataset named TON_IOT [138].The testbed was implemented using multiple virtual machines comprising multiple operating systems to accommodate the interconnectivity among the three layers of IIoT,Cloud,and Edge/Fog systems.A variety of AI-based cyber security applications can be validated and tested with this data set,such as IDSs,threat intelligence,malware detection,fraud detection,and others.

TABLE VPUBLIC DATASETS FOR CYBER SECURITY INTRUSION DETECTION IN AGRICULTURE 4.0

14) InSDN Dataset:This dataset was introduced by [139] to overcome the limitations of existing datasets in the context of SDNs.It includes more than 80 features in CSV format,with a total instances of 343 939 for both normal and attack traffic.The InSDN dataset incorporates diverse classes of attacks such as DoS,DDoS,Web,Password-Guessing and Botnets.Furthermore,the normal traffic in InSDN includes various types of famous application services such as HTTPS,HTTP and DNS.

C.Building Process

A great deal of research has been conducted to develop intelligent intrusion detection techniques to improve the security of networks [65],[144]–[146].The main steps for each IDS building process are 1) data collection,2) data preprocessing,3) classifier training,and 4) attack recognition.We provide a brief description of each step,as well as some of the latest techniques that can be implemented in Agriculture 4.0,as illustrated in Fig.12.

1) Data Collection:Information gathering represents the first,and critical,step in intrusion detection.The type of source of the data and the point at which the data is collected are two key factors in the design and performance of an IDS.[144].Sadikinet al.[145] provided a new,efficient,and reliable method for large-scale IDS data collection,which is applicable for Agriculture 4.0.The authors used Zigbee Diagnostic Reports to ensure that IDS data collection can be done reliably and efficiently in a resource-limited Zigbee IoT environment.

2) Data Pre-Processing:Once the data is obtained at the data collection stage,they are first processed in order to generate basic features [144].The feature selection technique,used as a pre-processing step in ML algorithms,attempts to reduce the complexity of the calculations by removing unnecessary features while preserving or even improving the performance of the IDS [65].The trained classifier demands that each record in the input data is expressed as a real number vector.Therefore,every symbolic characteristic in the dataset must be first converted to a numeric value,a technique called data transferring [144].Data normalization refers to the process of scaling the value of each attribute across a wellproportioned range,thereby removing the bias towards characteristics with larger values from the dataset,which can increase significantly the accuracy of the classifying algorithm[144].Khanet al.[147] proposed an approach called HMLIDS to address the challenge of building an intrusion detection framework from unbalanced intrusion datasets,specifically designed for the industrial control system (ICS),that is suitable for application in Agriculture 4.0.The approach includes a feature extraction technique derived from data normalization with data feature retrieval (DFR) and used a modified nearest-neighbor rule algorithm to balance the dataset,which enhanced the accuracy of classifiers.Experimental results obtained from a large-scale real dataset created using a SCADA system showed a 97% accuracy rate.

3) Classifier Training:After selecting the ideal subset of features,it is then brought into the classifier training stage[144].To enhance the accuracy of IDS,Zhouet al.[65]trained three separate classifiers as basic learners using C4.5,random forest (RF) and forest by penalizing attributes (Forest PA) algorithms,then construct an ensemble classifier from them,which can be adapted to Agriculture 4.0.The experimental results showed good results with a classification accuracy of 99.81%,99.8% DR,and 0.08% FAR with a subset of 10 characteristics for the NSL-KDD dataset.In addition,the results obtained for the AWID showed an accuracy of 99.52% with 0.15% FAR using a subset of only 8 characteristics.

4) Attack Recognition:Having completed all the above steps,it is possible to identify both normal and intrusion traffic via the trained and registered classifier.Afterward,the test data is passed to the trained and saved model for intrusion detection [144].Renet al.[146] proposed a data optimization approach to build IDS,referred to as DO_IDS,which has two main parts: data sampling and feature selection.In data sampling,the authors used iForest to sample the data while the combination of the genetic algorithm (GA) and RF is applied to optimize the sampling ratio.In feature selection,the GA and RF combination is used again but this time to select the ideal subset of characteristics.DO_IDS achieved better results than the RF classifier,in particular for the detection of anomalies,such as DoS,analysis,backdoor,and worms.Nevertheless,some enhancements can be focused on,such as the time cost associated with the data optimization step and the support for online processing.

D.IDS Potentials for Agriculture 4.0

The integration of intelligent agricultural systems makes both intelligent objects more effective and agricultural production more efficient [27].Nevertheless,these systems are susceptible to a variety of security attacks,which can cause considerable damage to agricultural services and applications such as smart crop/livestock monitoring,food supply chain traceability,unmanned aerial vehicle (UAV),and unmanned ground vehicle (UGV) autonomous tasks.An IDS implemented for the Agriculture 4.0 environment is expected to deliver real-time packet analysis and feedback,support different network layers with different protocol stacks,and support different technologies such as IoT,Fog/Cloud computing,Blockchain,and SDN/NFV.Furthermore,it is expected to operate within tight constraints of limited processing conditions,high response speed,and high data volume processing.Table VI presents some of the benefits of IDS for Agriculture 4.0,in terms of technologies and applications applied in Agriculture 4.0,the scenarios of either the absence or presence of IDS,and the type of IDS appropriate for each scenario.

E.Software Implementations

Table VII highlights key features offered by the most popular software packages for implementing deep learning in IDS-secured Agriculture 4.0.Supported techniques include common deep learning architectures such as convolutional neural networks (CNN),recurrent neural networks (RNN),restricted Boltzmann machine (RBM),and deep belief network (DBN).Parallel processing techniques are used to boost the performance through GPU acceleration,where some of the well known techniques are CUDA,OpenMP,and OpenCL.Pre-trained models indicate whether the framework accepts already trained models as a starting point.Cloud Support states whether the framework enables cloud-basedservices to accelerate the training process.

TABLE VIIDS’S BENEFITS TO AGRICULTURE 4.0

V.FUTURE DIRECTIONS

As shown in Fig.13,to complete our study,we outline both open challenges and future research opportunities that could improve the capabilities and effectiveness of machine learning and deep learning techniques for cyber security in Agriculture 4.0,summarized in the following suggestions:

A.Audio Recognition and Computer Vision for Cyber Security Intrusion Detection

The results in our study show that deep learning techniques can provide better performance in cyber security intrusion detection for Agriculture 4.0 compared to traditional machinelearning techniques such as decision trees,random forests,naive Bayes,and logistic regression.Therefore,many characteristics should be taken into account when developing more efficient cyber security intrusion detection for Agriculture 4.0 such as the number of features in flow-based IDS datasets.To successfully train deep learning techniques,the number of features in flow-based IDS datasets should be adequate.Potential future research directions in this topic could be related to developing a mechanism to transform a group of network traffic flows into large encoded data,signal,or image,such as the use of Internet of multimedia things(IoMT) approaches [158].This mechanism will train deep learning techniques well,which enhances performance of cyber security intrusion detection in Agriculture 4.0.The use of audio recognition and computer vision to build a network intrusion detector based on deep learning techniques is one significant research challenge.

TABLE VIIDEEP LEARNING FRAMEWORKS FOR IDS-SECURED AGRICULTURE 4.0

Fig.13.Observations and challenges of machine learning and deep learning for cyber security in Agriculture 4.0.

B.Combining Blockchain Technology With Machine Learning and Deep Learning Techniques for Cyber Security

Blockchain is a technology for the storage and transmission of information,transparent,secure,and operating without a central control unit [159],[160].The Blockchain can be applied for Agriculture 4.0 for improving the speed of transactions,safety,and reliability,and to reduce existing transaction or centralization costs in traditional agriculture systems [87].However,several challenges remain for the practical realization of Blockchain in Agriculture 4.0 due to cyber security threats.For example,an adversary can launch attacks,misuses,and compromises to affect the functionality of the Blockchain ledger,which can disrupt food safety,agrifood supply chain efficiency,and agricultural productivity.Hence,the combination of a blockchain-based scheme with an IDS must be designed and implemented to further protect Agriculture 4.0 from cyber attacks.

C.Deep Reinforcement Learning for Cyber Security

Machine learning and deep learning techniques are useful tools to detect anomalies in Agriculture 4.0.The successful implementation of these techniques depends on the following characteristics: 1) neural network opacity,2) identification of data anomalies and errors in the dataset,3) achieving the lowest false alarm rate,and 4) providing the highest possible accuracy.However,reinforcement learning has had success in scaling to decision-making problems [161],[162].To solve complex problems in cyber security,the combination of deep learning and reinforcement learning,called deep reinforcement learning,can be used to detect anomalies before they do any damage in Agriculture 4.0 with the lowest false alarm rate and the highest possible accuracy.Hence,cyber security intrusion detection based on deep reinforcement learning should be carefully designed to find the best anomaly detection system.

D.Managing Physical Safety and Security of the IDS

Firewalls and anti-malware software are not sufficient to provide security in Agriculture 4.0 against cyber attacks.The use of an IDS is vital to defend against intrusion,which operates in conjunction with authorization access control,privacy,and authentication tools.Making it possible to monitor resource access events more accurately,to make sure that authorized users are granted access to information resources under certain specified conditions [163].Since Agriculture 4.0 uses many emerging technologies (e.g.,Cloud computing,Fog/Edge computing,SDN/NFV,Drones,etc.),the successful location and implementation of an IDS depends on the following constraints: 1) Lowering operating costs of the implementation in each level layer (i.e.,Cloud layer,SDN layer,Edge layer,etc.),2) Network bandwidth constraints,3)Latency constraints,and 4) Managing physical safety and security.A possible research direction in this topic could be related to managing the physical safety and security of the IDS in Agriculture 4.0.In particular,ultrasonic and infrasound may be a potential physical attack source if they have an impact on the health of livestock or plants.In addition,the issue of deciding where IDSs should be deployed with Agriculture 4.0 remains a very challenging one to tackle.

E.Scalability in the Age of Next-Generation Networks

Next-generation networks are set to power the future of an intelligent,ubiquitous,connected,and data-rich Internet of everything (IoE),and transform the way wireless technologies evolve from “connected objects” to “connected intelligence”,with extremely higher throughput.To improve performance,in many cases scientists invest more computational resources into implementation.But still,in the world of AI,no successful large-scale implementations of machine learning exists [164].Given the fact that the Agriculture 4.0 data is heterogeneous,complex,and massive,a possible research direction would involve the development of a scalable IDS for Agriculture 4.0,with zero false positives and real-time detection.

F.Protections Against Data Poisoning Attacks

One of the most significant challenges of machine learning algorithms is their vulnerability to poisoning attacks,especially in cyber security-related applications.It is possible for an adversary to inject malicious points into the training dataset to affect the learning process [165].This type of attack can render the IDS useless,which opens the door to many different attacks,potentially resulting in a disastrous impact on the Agriculture 4.0 industry.A future research direction should address this issue with non-computationally intensive methods to determine which regions of the underlying data collection distribution are potentially more vulnerable to data poisoning.

G.Moving Forward With Quantum Enhanced Machine Learning for IDS

Quantum computers are designed to achieve quantum supremacy by leveraging quantum physics to accomplish computational tasks that exceed the capabilities of the most powerful conventional computers [166].A possible research direction would be the study of quantum-assisted machine learning (QAML) in IDS design and implementation for Agriculture 4.0.

VI.CONCLUSION

In this paper,we reviewed and analyzed IDS for cyber security in Agriculture 4.0.First,we presented the cyber security threats and the several evaluation metrics employed in evaluating the performance of an IDS for Agriculture 4.0.Next,we evaluated IDSs in relation to emerging technologies.In addition,we provided a comprehensive classification of IDSs in every emerging technology.Then,we presented the public datasets and implementation frameworks applicable to the IDS performance evaluation for Agriculture 4.0.Finally,we highlighted the challenges and future research directions in cyber security intrusion detection for Agriculture 4.0.