Development of a Predictive Model for Rapid Detection of Sulfur Content in Honeysuckle Based on Hyperspectral Imaging Technology
2019-01-07FENGJieLIUYunhongSHIXiaoweiWANGQingqingXUQian
FENG Jie, LIU Yunhong,2,*, SHI Xiaowei, WANG Qingqing, XU Qian
(1. College of Food and Bioengineering, Henan University of Science and Technology, Luoyang 471023, China;2. Henan Engineering Technology Research Center of Food Materials, Luoyang 471023, China)
Abstract: For rapid and non-destructive detection of sulfur content in honeysuckle, the flowers of Lonicera japonica Thunb.,hyperspectral imaging technology combined with chemometrics was applied to develop a predictive model for detecting sulfurfumigated honeysuckle with different sulfur concentrations. Hyperspectral images of non-fumigated and sulfur-fumigated honeysuckle samples with four concentration gradients of 0%, 0.5%, 1% and 1.5% on a fresh mass basis were collected and preprocessed by Savitzky-Golay smoothing filter (S_G filter), multiple scatter correct (MSC) or standard normal variate transformation (SNV). S_G filter was selected as the optimal pretreatment method. Subsequently, the processed spectral data were used to establish models using either fisher discriminant analysis (FDA) or kernel Fisher discriminant analysis (KFDA),and the results showed that KFDA had a better discrimination accuracy of 98.2%. Considering that the full-range spectral data contain a great deal of redundancy, the characteristic wavelengths were extracted by three different methods, regression coefficients (RC), Wilks criterion and RC-Wilks. As a result, the discriminant models, RC-KFDA, Wilks-KFDA and RC-Wilks-KFDA were developed. A comparison was made between these models, and the RC-Wilks-KFDA model was found to be the best one with the highest discrimination accuracy of 100%, good classification efficiency and short running time of 0.69 s. Therefore, the S_G-RC-Wilks-KFDA model could allow fast, effective and non-destructive detection of sulfur content in honeysuckle.
Keywords: honeysuckle; hyperspectral imaging; sulfur content; rapid detection
Honeysuckle (Lonicerae japonicae Flos) is a traditional Chinese herb and favorite cool tea material. It contains many nutrient components, including phenols, flavonoid,sugar, triterpenes, essential oil and mineral substance, and has multifaceted healthcare effects, including antibiosis, antiinflammatory, heat-clearing, and detoxification[1-3]. In the recent years, honeysuckle is widely used in herbal tea, beverage,granules and other food industries[4]. With growing demand of honeysuckle, the quality requirements for honeysuckle become stricter gradually.
Since honeysuckle is rich in phenolic compounds and polyphenoloxidase, it is easy to produce enzymatic browning in honeysuckle during picking, drying, and storage etc.[5-6],resulting in serious quality degradation. Therefore, several pretreatment methods are usually applied to inhibit enzymatic browning inside honeysuckle[1,5], such as steam blanching, liquid soaking, sulfur fumigation. Among them, sulfur fumigation is a traditional pretreatment method for crops. Sulfur dioxide generated by combustion could react with water on the surface of materials to generate sulfurous acid, which could inhibit enzymatic browning based on its characteristics of anti-oxidation, inhibiting enzyme activity and sterilization[7].However, this approach can easily lead to excessive sulfur content in agricultural products, triggering chemical transformations and reducing nutritional value. Therefore,the government has put forward strict requirements on the treatment of sulfur fumigation and residual sulphur content in agricultural products[8-9]. However, in order to earn exorbitant profits, some unscrupulous traders and producers secretly adopt sulfur fumigation method to inhibit the enzymatic browning of honeysuckle and maintain product color, despite national criterion for sulfur content. Such behavior can easily cause excessive sulfur residue, reducing the nutritional value of honeysuckle and producing negative impact on consumer health. But, the appearances of honeysuckle products with or without sulfur fumigation treatment are very similar, and the sulfur content of honeysuckle cannot be effectively judged by sensory evaluation. Therefore, in order to protect the quality of honeysuckle products and maintain the normal honeysuckle market, it is essential to establish an effective and practical method for detecting sulfur content in honeysuckle.
The traditional sulfur content detection and identification methods include high performance liquid chromatography(HPLC)[10], acid distillation titration[11], and colorimetric[12], etc.While these methods could achieve high accuracy, they have several defects, such as high operation requirements, tedious steps, and long test period. Hyperspectral imaging technology is an innovative detection method, which combines imaging technology and spectroscopy technology to obtain spectral and spatial information on internal and external quality characteristics simultaneously. It has the characteristics of rapid,non-destructive and accurate detection[13], which is expected to solve the problem of complicated operation and lengthy period for those conventional detection methods of sulfur content.Nowadays, hyperspectral technology has been widely used in identification of agricultural products and rapid non-destructive quality evaluation[14-16]. However, up to now, the studies on rapid and nondestructive detection of sulfur fumigation and sulfur residues in agricultural products by hyperspectral imaging have rarely been reported.
Therefore, in this study, honeysuckle samples were used to explore rapid and nondestructive detection method of honeysuckle samples with different sulfur contents based on hyperspectral imaging technology. Firstly, the spectral information for honeysuckle with different sulfur fumigation levels was collected by using a hyperspectral line scan imaging equipment. Then, different methods for noise removal and characteristic wavelengths extraction were conducted and compared. Finally, hyperspectral detection model was developed based on kernel Fisher discrimination analysis (KFDA) with the application of Wilks criterion. This research could provide practical reference for rapid, non-destructive and accurate identification of different sulfur residues in honeysuckle.
1 Materials and Methods
1.1 Materials and preparation
Fresh honeysuckle samples for the experiments were collected from Liugou honeysuckle planting base in Luoyang,Henan Province. The samples with similar size, green color and no pest-disease were selected for the experiments. In each experiment, four groups of fresh honeysuckle samples of (30 ± 0.2) g for each group were put on four reticular disks and then those disks were placed into four glass sealing tanks(30 cm in diameter and 30 cm in height). Industrial sulfur materials with 0.5%, 1.0%, and 1.5% of the weight of fresh honeysuckle samples were put into the bottom of three tanks,respectively. The sulfur materials were ignited for 1 h sulfur fumigation, and then, both the sulfur-fumigated honeysuckle samples and the samples without sulfur fumigation were taken out and placed inside a hot air dryer (101-3ES, Beijing Yongguangming Co.) for drying 15 h at 50 ℃. The sulfur residues in dried honeysuckle samples were determined according to GB/T 5009.34—2016 The determination of sulfur dioxide in food[17]. Each experiment was conducted in triplicates.
1.2 Methods
1.2.1 Hyperspectral imaging system
The hyperspectral imaging system used in this study consists of a spectrograph-based hyperspectral camera, an imaging spectrometer (Inno-Spec IST50-3810, Germany),a lens, four 500 W optical fiber halogen lamps (ESYLUX 90000420108, Germany), a conveyor belt controlled by a highprecision motor, a computer and a black box[18-19]. The imaging spectrometer has an acquisition range of 371–1 028 nm,with 1 288 bands in the spectral range and a spectral resolution of 2.8 nm.
1.2.2 Hyperspectral image acquisition
In order to guarantee image resolution and avoid image distortion, the objective lens distance and the exposure time of the hyperspectral camera and the moving speed of the conveying device should be determined before the image acquisition. After repeated adjustments, the objective lens height, the CCD camera exposure time for each hyperspectral image and the movement speed of carrier platform were determined as 300 mm, 90 ms and 1.00 mm/s, respectively.Image acquisition was achieved by exploiting hyperspectral imaging system acquisition software (Wuling Optical,Taiwan, China) and image data was processed by using the analysis software ENVI 5.1 (ITT, Visual Information Solutions) and MATLAB R2014a (The Math Works, Natick,USA).
Due to the inhomogeneous distribution of the light intensity at various wavelengths and the dark current of the sensor, the obtained images usually contained a large amount of noise in a wavelength band with weak light intensity distribution.In order to solve the problem, the calibration should be performed for hyperspectral images. Under the system condition which was as same as that of sample acquisition, the standard white calibration board with reflectance of 99% was scanned to obtain an entirely white calibration image. Then,the camera shutter was closed for image acquisition to receive a completely black calibration image. Finally, according to the equation (1), the image standardization was conducted to translate the acquisition of the absolute image into a relative image[20].
Where R is calibrated hyperspectral image; I is the original hyperspectral image; B is the whole black image;W is the all white image.
The whole honeysuckle sample was extracted as the region of interest (ROI) from the calibrated hyperspectral data, and the average spectral values of all the pixels in the ROI were taken as the spectral values of the corresponding sample. The ROIs of all the calibrated images were extracted using the software ENVI 5.1.
1.2.3 Data processing
The intent of Fisher discriminant analysis (FDA) is to outcrop the most easily-graded projection line in the original sample space based on actual situation. The basic idea of FDA is to minimize the intra-class distance of samples and maximize the distance between classes[21-23]. This method can effectively merge the classification information of training samples and extract features based on classification ability.
KFDA is an innovative extraction method of non-linear feature which combines the idea of nuclear learning with FDA.First, the data in the original space are mapped into a highdimensional kernel space through a nonlinear mapping in which a linear FDA is performed by introducing a kernel function[24-26].In this way, the obtained sample features are non-linear features relative to the original space, and these non-linear features are more conducive to classification. Gaussian function was used as the kernel function in this research.
Hyperspectral data usually have problems of high dimension and numerous data for data processing, which is time-consuming and inefficient. In order to solve these problems, the characteristic wavelengths selection is essential for increasing the performance of the prediction model.The regression coefficient (RC) is the spectral signal change caused by the change of the unit concentration of the tested component[27]. Larger absolute values of the regression coefficient represent a better linear relationship between absorbance and concentration and higher sensitivity. Large weighted regression coefficients values located at peaks or troughs of the bands were considered carrying more determination power and thus preferred as optimal wavelengths,while those with small weighted regression coefficient’s values were regarded as irrelevant and detrimental and were completely neglected. In order to solve the problem of the relatively complex prediction model based on full-band spectrum, dimension reduction should be conducted for the whole bands and the wavelengths with irrelevant information and noise should be eliminated, and then the characteristic wavelengths with the most useful information and the high signal-noise ratios should be selected.
The Wilks criterion can be used to examine the discriminant ability of various variables in multiple samples and the classification effect of the samples. The Wilks values reflect the effect of a certain principal component on classification.Valid information that is most conducive to the classification of samples is extracted to increase the discrimination performance[28-30]. The main calculation of Wilks criterion is as follows (equation (2), (3)).
Let ‘m’ represent the number of selected eigenvectors,then:
Where c represents the number of sample categories;Ngrepresents the number of samples in category g; Xigkand Xjgkrepresent the elements in the principal component matrix of the sample data array; μigand μjgare the mean values corresponding to Xigkand Xjgk, respectively; μjrepresent the overall mean values of the principal component elements corresponding to the ith μiand j eigenvectors, respectively.
Wilks criterion defines the ratio between the determinant values of matrix A and T as the deviation ratio, denoted as Λ,and as shown in equation (4):
Where T is the total deviation of each column of the principal component matrix obtained during the calculation of the characteristic parameter kernel of honeysuckle, and A is the intra-group deviation of the values of each column in the principal component matrix. When the A value is small and the T value of is large, the value of Λ is small, indicating that the principal component is conducive to classification.
Within the processes of training and prediction, the performance of a calibration model is usually evaluated in terms of root mean square error (RMSE) of training and coefficients of determination (R2) in the calibration process.Generally, a good model should have higher values of R2,lower values of RMSE, and shorter time. The data processing methods and results in this paper were obtained through MATLAB software.
2 Results and Analysis
2.1 Main quality indicators of honeysuckle with different sulfur fumigation levels
In this experiment, there were four groups of samples,including the non-fumigated and three groups of sulfurfumigated honeysuckle samples with different sulfur fumigation levels. The experiment was conducted with reference to the operation in the literature[17]. It was determined that the average amount of residual sulfur of honeysuckle with 0%, 0.5%, 1%and 1.5% sulfur fumigation levels were (1.2 ± 0.2), (316.8 ± 3),(572.5 ± 8) mg/kg and (965.8 ± 6) mg/kg, respectively. Chinese Pharmacopoeia states that sulfur-fumigated samples can be divided into non-fumigated samples, slight sulfur-fumigated samples, excessive sulfur-fumigated samples and serious excessive samples, and the sulfur residue of sulfur-fumigated samples should be lowere than 400 mg/kg[9]. For the first three groups, the color was light green with minor differences, and it was rarely straightforward to identify the three groups. The color of the fourth group was yellow-green, and the color of samples would change to bright yellow when the sulfur dosage continues to increase. The samples of the first three groups showed a characteristic flavor of honeysuckle, while the smell of the fourth group began to mix with a little sulphuric flavor.It could be seen that the sulfur content of the fourth group exceeded the Pharmacopoeia’s standard, and this group can be distinguished from the other three groups through sensory discrimination. The accurate discrimination of the first three groups was difficult because of similar appearance and flavor.The sulfur content of the third group also surpassed the Pharmacopoeia’s standard, and it was difficult to discriminate effectively with the samples with low sulfur content and the non-sulfur-fumigated samples. Therefore, it is necessary to take a more effective method to solve the inadequacy of artificial judging.
2.2 Original spectral curves of honeysuckle
The samples of each group were scanned using the hyperspectral system and 1 288 continuous bands from spectral range of 371–1 088 nm were acquired, and the low signalto-noise ratios at the front and back of the spectral data were removed. Therefore, a total of 1 029 bands within the range from the band 90–1 118 were selected, which was, the spectra in the range of 416–940 nm were selected for further analysis.The average spectral reflectance of each group was shown in Fig. 1. There were the same general trend of the average spectral reflectance values and characteristic absorption among the four groups of samples, and the reflectance values tended to coincide at partial band ranges. Therefore, it was arduous to distinguish the four groups of samples directly from the original spectrum.
Fig. 1 Mean spectra of four groups of honeysuckle samples
2.3 Division of training set and prediction set
In order to improve the prediction accuracy of the model, class assignment was applied to assign the four groups of samples, and the random method was applied to compartmentalize the training set and prediction set in a ratio of 2:1 from the tested 360 honeysuckle samples. Finally, 240 training samples and 120 prediction samples were determined.The results of the division of training set and prediction set were shown in Table 1.
Table 1 Class assignment and division of training and prediction sets of honeysuckle
2.4 Comparison of different spectral pretreatments
In order to improve discrimination performance and remove noise, it is necessary to conduct the pretreatment for original spectral data obtained from hyperspectral images before model establishment. Therefore, for a total of 360 randomly selected honeysuckle samples, three pretreatments of multiple scatter correct (MSC), standard normal variate transformation(SNV) and Savitzky-Golay (S_G) filter (the window width was set to 6 points and quadratic polynomial in this paper)were applied before the development of partial least square regression (PLSR) models. The results were shown in Table 2. The R2and RMSE are commonly used as the indicators for evaluating the model quality. In general, the values of R2as close as one and the RMSE as close as zero are considered as good forecasting capacity. As indicated in Table 2, PLSR model based on S_G filter yielded the best results in accuracy withof 0.989 3,of 0.895 0 and RMSECV of 0.115 6, RMSEP of 0.362 2. Therefore, the results indicated that S_G filter could exist the most obvious effect on noise removal in this research and could be considered as the superior pretreatment method for the discrimination model of honeysuckle’s sulfur contents.
Table 2 Prediction results of PLS models with different preprocessing methods based on full-range spectra
2.5 Modeling results of full spectrum based on FDA and KFDA
The results of identification models based on FDA and KFDA with full spectrum after S_G filter pretreatment were illustrated in Fig. 2. The FDA identification results were shown in Fig. 2a. The distribution of the four sample groups was extremely dispersed and the boundary between the different samples was not apparent with overlap in some parts. Hence, it was demonstrated that S_G filter could not eliminate all of spectral noise although most of them have been removed by S_G filter. The correct rate and running time of FDA identification were 82.6% and 3.11 s,respectively. Fig. 2b showed the identification results of KFDA, with the core parameter of kernel function of 0.964.The sulfur-fumigated samples of both group 1 and group 2 had obvious dividing lines with other samples. Although intraclass mapping of the sulfur-fumigated samples of group 4 was relatively concentrated, it possessed partial coincidence with the samples of group 3 and the boundary of the two groups weren’t acceptably obvious. The classification accuracy rate and operating time of KFDA were 98.2% and 1.17 s, respectively.Therefore, KFDA was selected for further analysis.
Fig. 2 Discrimination of honeysuckle samples by FDA and KFDA with S_G pretreatment
2.6 The results of KFDA at characteristic wavelengths with or without Wilks criterion
In this study, RC and Wilks criterion were applied to extract the characteristic wavelengths from the whole spectrum.As a result, the 10 characteristic bands obtained by RC algorithm included 69, 104, 192, 217, 239, 336, 382, 441, 543 and 604 bands, namely 451.9, 469.8, 514.9, 527.7, 539, 588.7,612.2, 642.4, 694.5, and 725.6 nm. Wilks criterion was sorted by descending order according to the characteristic bands and the contribution rate of the band information to choose the first 130 principal components.
Three discriminant models of RC-KFDA, Wilks-KFDA and RC-Wilks-KFDA based on S_G filter pretreatment were established, respectively. As illustrated in Fig. 3, with different methods of extracting characteristic wavelengths, the samples of four groups could be well separated with clear dividing lines,and all the correct rates of classification reached 100%. In Fig.3a, based on RC-KFDA, it was indicated that the intra-class distribution of samples was more dispersed, with the operating time of 0.43 s. Fig. 3b and Fig. 3c showed the results of Wilks-KFDA and RC-Wilks-KFDA, respectively, which showed that the four honeysuckle samples were concentrated, with running times of 1.40 s and 0.69 s, respectively.
Comparing with Fig. 3a and Fig. 2b, it could be seen that the characteristic wavelengths extracted by RC preserved a significant part of the band information, eliminating the influence of a large amount of noise information and improving the classification accuracy of the samples consequently.As can be seen from Fig. 3a and Fig. 3c, the arrangement order of the contribution rates of the selected characteristic wavelengths had certain influence on the KFDA classification effect. In terms of running time, the comparison of the running time required by the three modeling methods in Fig. 3 was in the order as follows: Wilks-KFDA > RCWilks-KFDA > RC-KFDA. The running time of RC-Wilks-KFDA was 0.26 s slower than that of RC-KFDA and 0.71 s faster than that of Wilks-KFDA. Although the running time of RC-Wilks-KFDA was a slightly slower than that of RC-KFDA,the performance of its model classification was the best among the three models. Yin Yong et al.[28]introduced the Wilks criterion into the principal component analysis (PCA)of wine identification, and the problem of over-elimination of the correlation between the variables with sole PCA could be solved by applying Wilks criterion. Yu Huichun et al.[31]introduced a method of screening principal components based on Wilks Λ statistics, and the results showed that the KFDA discriminant analysis based on the Wilks Λ could achieve better classification results. In this study, the classification results of RC-Wilks-KFDA could also obtain optimal results, which represented that application of Wilks criterion could increase the discrimination performance of identification model.Therefore, fast, non-destructive and accurate identification of different sulfur residues in honeysuckle could be achieved using RC-Wilks extracted characteristic spectra to establish the discriminant model of KFDA.
Fig. 3 Results of KFDA identification with characteristic wavelengths
3 Conclusion
The hyperspectral information of four sample groups of including the non-fumigated honeysuckle and sulphurfumigated honeysuckle samples with three different sulfur fumigation levels was obtained with hyperspectral technology.A rapid and non-destructive method for the determination of different sulfur contents in honeysuckle was studied. By comparing the three pretreatment methods, including S_G filter, SNV and MSC, S_G filter was found to be the best pretreatment method, with R2and RMSE values for the prediction set of 0.895 0 and 0.362 2, respectively. Subsequently,the models of FDA and KFDA were established by using the full spectral information with S_G filter pretreatment, and the KFDA model was better for the classification of honeysuckle with different sulfur contents. The order of contribution rate of characteristic wavelengths had certain effect on the effect of KFDA classification by comparing three methods of extracting characteristic spectrum, and all the discriminant accuracy values could achieve 100%, which was obviously superior to that of the full wavelength model. A comparison was made among the three models based on characteristic wavelengths spectrum.Compared with the Wilks-KFDA model, the running time of RC-Wilks-KFDA model increased 0.71 s. Compared with the RC-KFDA model, the intra-class mapping of RC-Wilks-KFDA model was more centralized and practical in spite of a little longer running time. Therefore, the S_G-RC-Wilks-KFDA model based on hyperspectral imaging technology could achieve a rapid, nondestructive and accurate detection of honeysuckle samples with different sulfur-fumigated degrees.