Topology and Semantic Information Fusion Classification Network Based on Hyperspectral Images of Chinese Herbs

2023-11-14BoyuZhaoYuxiangZhangZhengqiGuoMengmengZhangWeiLi

Journal of Beijing Institute of Technology 2023年5期

Boyu Zhao, Yuxiang Zhang, Zhengqi Guo, Mengmeng Zhang, Wei Li

Abstract: Most methods for classifying hyperspectral data only consider the local spatial relationship among samples, ignoring the important non-local topological relationship.However, the nonlocal topological relationship is better at representing the structure of hyperspectral data.This paper proposes a deep learning model called Topology and semantic information fusion classification network (TSFnet) that incorporates a topology structure and semantic information transmission network to accurately classify traditional Chinese medicine in hyperspectral images.TSFnet uses a convolutional neural network (CNN) to extract features and a graph convolution network(GCN) to capture potential topological relationships among different types of Chinese herbal medicines.The results show that TSFnet outperforms other state-of-the-art deep learning classification algorithms in two different scenarios of herbal medicine datasets.Additionally, the proposed TSFnet model is lightweight and can be easily deployed for mobile herbal medicine classification.

Keywords: Chinese herbs; hyperspectral image; deep learning; non-local topological relationships;convolutional neural network (CNN); graph convolutional network (GCN); lightweight

1 Introduction

Since the outbreak of Coronavirus disease(COVID-19) in 2020, it has become a widespread pandemic worldwide, causing a large number of casualties and extensive impact on the global economy and daily life [1].In response to the changing situation, the National Health Commission has proposed the “Diagnosis and Treatment Protocol for novel coronavirus pneumonia (Trial Version 9)” for COVID-19 caused by the Omicron strain, which includes traditional Chinese medicine (TCM) treatment.TCM can play an important role in the treatment of COVID-19 in all three stages of its development, which fully demonstrates that TCM can still play a vital role in the new era and make more significant contributions to human medicine.

As the role of Chinese herbal medicine in the treatment of COVID-19 gradually emerges, the demand for Chinese herbal medicine in China is expected to remain at a high level for a foreseeable period, which may cause a supply demand imbalance in the Chinese herbal medicine market.Investigating the planting area of Chinese herbal medicine, predicting the production of key Chinese herbal medicine in the future, and reasonable planning of the types and areas of Chinese herbal medicine planting can help ensure the balance of supply and demand for urgently needed Chinese herbal medicine and avoid losses to those in the Chinese herbal medicine industry caused by excessively low prices.

In recent years, with the increase in demand for Chinese medicinal materials, statistical analysis of the planting situation of Chinese medicinal materials has become a new research hotspot.Hyperspectral technology [2, 3] is an advanced imaging technology that can both image and obtain spectral information.It reflects the spectral absorption differences generated by molecular vibrations and electronic transitions inside different substances.It is an important means of remote sensing and observation of the earth, and an indispensable part of spatial information.With the continuous development and maturity of hyperspectral imaging technology, hyper-spectral technology has been widely used in fields related to national economy and people’s livelihoods such as medicine, military detection, environmental monitoring, and urban planning [4, 5].However, research on the application of spectral imaging technology in the field of traditional Chinese medicine is still limited, mainly focusing on the identification, non-destructive testing, and content determination of Chinese medicine [6].Research on material identification and distribution analysis of Chinese medicine in large scenes is still insufficient.The application prospect of hyperspectral imaging technology in the field of traditional Chinese medicine is broad and can effectively promote the development of traditional Chinese medicine [7, 8].

In the field of hyperspectral imaging, various machine learning methods have been validated as effective methods, such as support vector machines [9] (SVM), which can implement classification on hyperspectral data through its nonlinear kernel function.However, traditional machine learning methods have limited representation ability and low generalization due to the manually designed features, leading to lower interpretation accuracy for the rich vegetation categories of Chinese herbal medicine.Compared to traditional machine learning methods, deep learning-based methods can extract deep-level features through convolutional neural networks(CNN), greatly improving classification accuracy.For example, the end-to-end convolutional recurrent neural network [10] (CRNN) uses adaptive feature sequence extraction ability to achieve hyperspectral data classification; the twobranch convolutional neural network [11] (twobranch CNN) enhances the network’s classification discriminative power for hyperspectral data through multi-type feature fusion methods; and the three-dimensional convolutional neural network [12] (3D-CNN) improves the classification ability of complex spatially distributed hyperspectral data by enhancing spectral information utilization.Although these methods perform well in interpretation tasks based on hyperspectral data, they only extract statistical features from input data using CNN architecture, ignoring the potential topological correlation information between different land cover categories.

Based on the research background and objectives analyzed in the previous section, this study aims to classify and identify traditional Chinese medicine using hyperspectral remote sensing data in two regions, Honghu and Hanchuan, Hubei Province, China.A lightweight deep learning model suitable for hyperspectral data classification of traditional Chinese medicine is designed, and different deep learning methods are compared and analyzed based on their test results.

Graph convolutional networks (GCN) will provide more useful information for feature extraction and classification because it can capture the relationship among different pixels or regions in HyperSpectral Image (HSI), and this relationship expressed by similarity implies the structural information of the data.Therefore,GCN has been gradually applied to HSI pixel classification.Danel et al.[13] considered that the GCN model can model the association relationship between nodes, but they ignored the spatial adjacency of pixels in the image.As a result, they proposed the spatial graph convolution network (SGCN), which selects all neighbor nodes within the spatial neighborhood of pixels when constructing the adjacency matrix.Qin et al.[14] considered that the pixel spectral feature information and the spatial neighborhood information of the pixel in the image are both important, and proposed the spectral-spatial graph convolution network (S2GCN) model.However,most methods based on GCN rely too heavily on the quality of superpixels obtained from segmentation algorithms, which seriously affects the accuracy of the composition.

Traditional CNNs lack the exploration of nonlocal topological relations [15] because they only convolve fixed rectangular regions.Meanwhile, most methods based on GCN rely too heavily on the quality of superpixels obtained from segmentation algorithms, which seriously affects the accuracy of the composition.Therefore, neither CNN nor GCN alone can effectively solve the task of fine classification of Chinese herbs in hyperspectral images.Combining the above analysis, we designed Topology and semantic information fusion classification network (TSFnet) to effectively integrate the probability distributions of CNN and GCN using consistency constraints instead of traditional simple feature fusion, enabling the two to work together efficiently.

In this paper, we combine convolutional neural networks with graph convolutional networks to improve the spatial perception ability of the network, construct a more robust deep learning classification framework, and design a dynamic graph updating mechanism to achieve more accurate feature embedding and graph construction,further improving the discriminative power of the graph features extracted by GCN.The key innovations are summarized as follows.

1) This paper combines CNN and GCN to improve the network’s spatial perception ability,by implementing graph construction based on deep features extracted by CNN to enhance the fusion of semantic information and topological structure information in images.Additionally,the network extracts short-range (local) spatial information based on CNN, and medium-range and long-range spatial information based on GCN.By combining the extraction of these three types of spatial information, the network’s perception of spatial characteristics is improved.

2) The process of dynamically updating the graph in the TSFnet based on deep learning can ensure more accurate feature embedding and graph construction, which helps to improve the discriminative power of the graph features extracted by GCN.

2 Proposed Method

The overall architecture of the TSFnet, a classification network that integrates topological structure and semantic information, proposed in this paper is shown in Fig.1.Training samples are randomly selected in patch form and fed into a CNN network to extract features with rich semantic information.The features extracted by CNN are denoted asZ.These features are then used to construct subgraphs as inputs to the GCN branch.Graph sample and aggregate(GraphSAGE), a graph sampling aggregation network [16], is used as the convolutional layer in the GCN branch to extract graph features,denoted asG.Finally,ZandGare used for probability prediction, and a distribution consistency constraint is imposed to enable cooperative training of CNN and GCN.

2.1 Lightweight Convolutional Neural Network

Fig.1 Flowchart of the TSFnet, including CNN, GraphSAGE and conformance constraints, where different colors indicate different classes

Deep learning has developed to this day, and classic deep models such as visual geometry group (VGG) [17], residual neural network(ResNet) [18], and Inception [19] have been used as the base models for most methods.However,these methods are all designed for three-band optical image data, and lack adaptability when faced with hyperspectral image data with hundreds of bands.Moreover, the large number of model parameters makes real-time and efficient inference difficult, and it is not easy to deploy on embedded devices.Therefore, we designed a lightweight CNN model and combined it with GCN to achieve high-precision classification and recognition of traditional Chinese medicinal materials through cooperative learning.

The specific parameters of TSFnet settings are shown in Tab.1.We built a four-layer cascaded semantic feature extractor using the Conv2d-BN2d-LeakyReLU (Conv2d-2D convolution, BN2d-2D BatchNorm) structure.The CNN branch predicts the output probability and calculates the corresponding error using the sample label.The error calculation in this case uses cross-entropy loss.The cross-entropy loss of sampleAis defined as

whereyirepresents the one-hot encoding of the label information corresponding toxi,pirepresents the predicted probability output obtained by the softmax function, andCrepresents the number of classes.Therefore, the classification loss based on CNN can be defined as

Tab.1 The structure diagram of TSFnet

whereS(·) represents the softmax function, andzirepresents the features extracted by CNN.

2.2 Graph Convolutional Network

In recent years, by modeling the relationships among samples (or vertices), GCN has been successfully applied to the representation and analysis of non-Euclidean space data, namely graphstructured data.The structure of GCN is shown in Fig.2.Compared to CNN, which only extracts local spatial information from hyperspectral images, GCN constructs mid-range and longrange graph data for land cover categories, fully considering the inherent topological relationships between land cover classes.We combine CNN with GCN, using the deep features extracted by CNN to construct the graph, in order to combine the rich semantic information and topological structure in the image.In addition, CNN extracts short-range (local) spatial information,while GCN can extract mid-range and long-range spatial information.This combination can improve the network’s perception ability of space.

Fig.2 Schematic of GCN and the muti-range topological relationships

In the GCN branch, GraphSAGE is used as the graph convolutional layer, which belongs to the aggregation-based GCN method and defines the graph convolution operator from a spatial perspective.This method randomly samples adjacent nodes so that the number of adjacent nodes for each node is less than the given sample size.Fig.3 shows the forward propagation principle of GraphSAGE, where nodesv1andv2are the central nodes, and only the sampled nodes are considered as related nodes for the first-order neighboring nodes (k=1).Then, GraphSAGE aggregates the adjacent nodes to update the features of the central node and propagates the information by updating the features of nodesv1andv2using the aggregation results.From Fig.3, it can be seen that during the information propagation process, the node information can be extended tok-order neighbors afterkaggregations.For example, nodev1in the third layer aggregates its first and second order information.In topology structure extractors, Graph-SAGE uses a mean aggregation function, which can be represented by

Fig.3 GraphSAGE forward propagation schematic

wherevanduare respectively the central node and the adjacent node,N(v) is a neighborhood function used in GraphSAGE [16], specifically,N(v)=2v, which controls the selection order of adjacent nodes,σ(·) is the non-linear activation function,W ℓ-1is the weight matrix of the first layer, and Mean is the addition of the central node and the selected adjacent nodes.In the topology structure extractor, we use two layers of GraphSAGE based on the mean aggregation function.The classification loss of GCN is defined as follows

where the features of the input image extracted by GraphSAGE are denoted asgi.With the training of the network, the graph features can be dynamically adjusted, and the dynamic updating process of the graph is conducive to more accurate feature embedding and graph construction, which improves the graph features extracted by GCN and further corrects the bias of the topological information of the graph.

2.3 Consistency Constraint

How to fuse the features extracted by CNN and GCN is crucial, as it determines whether they can collaborate and achieve good classification performance during the training process.Unlike simple feature fusion, in order to better make CNN and GCN work together during the training process, a consistency constraint of probability prediction distribution is designed in TSFnet.The cross-entropy function can be used to calculate the difference between the learned model distribution and the training distribution, and thus can be used to evaluate the consistency of the distribution.The consistency constraint is defined as follows

3 Experimental Results and Discussions

3.1 Dataset Description

The data used in this experiment comes from the Remote Sensing Intelligent Data Extraction,Analysis and Application (RSIDEA) team at Wuhan University, and was collected using a Headwall Nano-hyperspectral imager installed on an unmanned aerial vehicle (UAV).The hyperspectral data of multiple types of traditional Chinese medicine in Honghu and Hanchuan, Hubei Province, China, in the spectral range of 400–1 000 nm were collected [20, 21].The data had been preprocessed using HyperSpec software provided by the instrument manufacturer, including radiometric calibration and geometric correction.The radiometric calibration operation included converting the raw digital values to radiance values using laboratory calibration parameters of the sensor.

The study area of the Hanchuan dataset is an urban-rural combined region, including six types of medicinal herbs that can be used as medicine, such as strawberries, cowpeas, soybeans, sorghum, water spinach, and watermelon seeds, as shown in Fig.4 (a).The UAV flew at a height of 250 m, the image size was 1 217×303 pixels, there were 274 bands in the spectral range of 400–1 000 nm, and the spatial resolution was about 0.109 m.The study area of the Honghu dataset is complex, including 12 types of medicinal herbs that can be used as medicine, such as cotton, rapeseed, cabbage, bok choy, kale, mustard, Chinese flowering cabbage, spinach, baby bok choy, lettuce, celery, and film-covered lettuce, as shown in Fig.4 (b).The UAV flew at a height of 100 m, the image size was 940×475 pixels, there were 270 bands in the spectral range of 400 – 1 000 nm, and the spatial resolution was about 0.043 m.

Fig.4 Visualization of samples from different origins: (a)dataset Hanchuan; (b) dataset Honghu

3.2 Settings of Experiments

For the classification task on the two datasets,the Hanchuan dataset selected six plant categories that can be used as medicine for six classifications, and the Honghu dataset selected twelve plant categories that can be used as medicine for twelve classifications.

In this experiment, SVM [9], CRNN [10],Two-branch CNN [11], Context CNN [22], and 3D-CNN [12] were used as comparative algorithms.SVM was used as a traditional machine learning comparison algorithm.3D-CNN and other frameworks are deep learning networks with good classification performance in recent years, which were used to demonstrate the superiority of TSFnet in classification algorithms.To ensure fairness, the number of network training rounds was set to 500.In the experiment, 10-fold cross-validation was used for model parameter selection.After selecting the optimal parameters,the training set and test set were randomly selected, with the training set accounting for 5%and the test set accounting for 95%.The random selection process was repeated 10 times, and the average classification accuracy was obtained.At the same time, the overall classification performance was evaluated using the average classification accuracy (AA), overall classification accuracy (OA), and Kappa coefficient (KC).

In addition, portable experiments were designed based on the need for lightweight models.The designed TSFnet algorithm was imported into embedded devices to ensure that it can be applied in practical applications, such as field classification of Chinese medicinal herbs.

3.3 Classification Performance

In the plant category classification tasks of two datasets, we selected the number of training and testing samples in the Hanchuan and Honghu datasets as shown in Tab.2 and Tab.3.After selecting the optimal parameters of each algo-rithm through cross-validation and repeating the experiments ten times, the OA, AA, and KC are shown in Tab.4 and Tab.5.It can be seen from Tab.4 and Tab.5 that the proposed algorithm TSFnet is significantly better than the compared algorithms under the same training sample conditions and shows good classification performance in both the Honghu and Hanchuan datasets.The classification accuracy of medicinal plants in the Hanchuan dataset is all above 90%, while the classification accuracy of medicinal plants in the Honghu dataset is all above 85%, indicating high consistency.The proposed model performs stably and has certain generalization ability, which also indicates that under the algorithm framework proposed in this paper, the category of medicinal plants can be well distinguished.

Tab.2 Number of training and testing samples for herb classification in the Hanchuan dataset

Tab.3 Number of training and testing samples for herb classification in the Honghu dataset

Tab.4 Classification accuracy for the Hanchuan dataset

Tab.5 Classification accuracy for the Honghu dataset

The above quantitative experimental results prove that the combination of CNN and GCN in TSFnet can capture more rich semantic information and topological structure information of medicinal plant data in different datasets.CNN extracts local spatial information, while GCN extracts non-local spatial information.The combination of the two can obtain a classification model with more sensitive spatial perception ability, making TSFnet perform best in different categories of medicinal plants in different datasets.

In order to better demonstrate the classification performance of the designed TSFnet network, we conducted visualization experiments on the Hanchuan and Honghu datasets.Fig.5 shows the prediction map and ground truth map of the Hanchuan dataset, while Fig.6 shows the prediction map and ground truth map of the Honghu dataset.Different colors represent different Chinese medicinal herb categories.From the ground truth and prediction maps of the Hanchuan dataset, it can be seen that TSFnet has excellent interpretability for identifying strawberry, snap beans, soybeans, sorghum, and water spinach,with only a small number of classification errors for watermelon seeds.Similarly, it can be seen from the Honghu dataset that TSFnet has good classification ability for 12 types of medicinal herbs that can be used for medicine.This is because TSFnet not only includes feature extraction by CNN, but also includes extraction of GCN topological information, which gives it strong classification ability for hyperspectral remote sensing data of Chinese medicinal herbs.

Fig.5 Visualization of samples from Hanchuan dataset: (a)Hanchuan label; (b) Hanchuan prediction; (c) the species represented by the color

Fig.6 Visualization of samples from Honghu dataset: (a)Honghu label; (b) Honghu prediction; (c) the species represented by the color

3.4 Lightweight Performance

In practical applications, TSFnet can be transplanted to embedded devices for classifying traditional Chinese medicine data in different regions,greatly enhancing the portability of the model.From Tab.6, it can be observed that TSFnet achieved the best performance in terms of all lightweight parameters.As shown in Fig.7, the left side is the embedded device, and the right side display screen shows the classification results.The lightweight experiment was conducted on an NVIDIA Jeston AGX Xavier embedded device using Pytorch in Python to classify traditional Chinese medicine in the Han River and Honghu datasets.The results of the experiment show that TSFnet can be imported into embedded devices and achieve good classification performance.

Tab.6 Comparisons with other networks on Honghu dataset

Fig.7 Lightweighting experiments with different datasets: (a)TSFnet lightweight experiment on Honghu dataset; (b)TSFnet lightweight experiment on Hanchuan dataset

4 Conclusion

In this paper, a high-efficiency TSFnet is proposed for the classification task of Chinese herbal medicine hyperspectral data.This method applies both CNN and GCN to the classification task,dynamically constructing a subgraph with rich semantic information through deep features.Furthermore, a consistency constraint is designed to achieve better fusion of CNN statistical distribution features and GCN topological structure features, improving the model’s spatial awareness and promoting better utilization of spectral information of Chinese herbal medicines in hyperspectral data.Experimental analysis shows that the proposed algorithm framework exhibits excellent classification performance in Chinese herbal medicine classification tasks and performs stably on different Chinese herbal medicine hyperspectral datasets with high robustness.In addition,the experiments also demonstrate that the model can be imported into embedded devices for classification of Chinese herbal medicine hyperspectral data after training, proving the model’s portability and lightness, and expanding its application scope and scenarios in the future.

Journal of Beijing Institute of Technology

2023年5期