APP下载

近红外结合线性回归算法快速预测小麦籽粒干物质和重量

2022-03-06何鸿举欧阳娟欧行奇郭景丽王玉玲李新华

食品工业科技 2022年4期
关键词:新乡籽粒河南

陈 岩,何鸿举, ,欧阳娟,欧行奇,郭景丽,王玉玲,乔 红,李新华

(1.河南科技学院食品学院,河南新乡 453003;2.新乡市农乐种业有限责任公司,河南新乡 453003;3.河南科技学院生命科技学院,河南新乡 453003;4.河南心连心化学工业集团股份有限公司,河南新乡 453004)

1 Introduction

As one of the main daily food sources, wheat grain has high content of starch and protein, not only providing energy for human body, but also enhancing immunity against disease[1]. With the continuous improvement of living standards, demanding for high quality wheat flour is increasing. At present, the common indexes for evaluating wheat grain quality include dry matter, bulk density, 1000-kernel weight,hardness, moisture content and protein content[2-4].Among these indexes, dry matter plays an important role and its content directly has an impact on the wheat storage period[5], therefore affecting the processing technology of wheat flour. Weight is another quality index and always used for assessing sale profit, as the wheat weight is positively correlated with the flour weight. Hence, it is necessary to monitor the dry matter and weight of wheat grains[6].However, traditional methods for measuring dry matter and weight are always time-consuming,laborious, inefficient and destructive[7-8]. Some better techniques or approaches should be exploited to meet the increasing requirements of rapid and nondestructive determination for wheat industry and consumers.

Near-infrared (NIR) spectroscopy as one of nondestructive techniques has been proven to be promising and efficient[9-11]. NIR spectroscopy is used for detection based on absorption information of the vibration harmonics of hydrogen groups (e.g. O-H, CH, N-H) in organic molecule, and prediction model built with chemometrics. NIR technology has been widely studied and applied in food research with many advantages such as no pollution, simultaneous detection of multiple components and low cost[12-14].As for cereal food, NIR technology has been used for identification of the geographical origin[15], determination of moisture, protein and hardness of wheat grains, however, there are few reports on the determination of dry matter of wheat grains[16-18]. The study on the determination of wheat grain weight by NIR technique has also not been reported.

Give this, we attempted to exploit the potential of NIR for rapid prediction of dry matter and weight in wheat grain. The specific objectives of this study were: a. Obtaining spectral data by a NIR system covering 900~1700 nm region. b. Selecting optimal wavelengths associated with dry matter and weight by different chemometrics methods. c. Finally establishing a simplified PLS model based on the selected optimal wavelengths for predicting dry matter and weight respectively in a rapid and nondestructive way. This study explored the feasibility of constructing the prediction model of wheat grain dry matter and weight by NIR spectroscopy, and provided a theoretical basis for the development of portable detection instruments in the future.

2 Materials and methods

2.1 Materials and reagents

The wheat grain raw materials used in the experiment included 35 wheat varieties from Huanghuai wheat area. The 35 wheat varieties were obtained from different regions with different characteristics and quality. The diversity of sample varieties and regions is an important factor affecting the universality and practicability of the prediction model[19].The thirty-five wheat varieties including Bainong 207, Bainong 365, Bainong 219, Bainong 607,Bainong 201, Bainong 321, Bainong 418, Bainong 307, Bainong 4199, Xinmai 21, Xinmai 26, Xinmai 208, Yanshi 4110, Cunmai5, Huayu 198, Huayu 166,Shangdumai 168, Xinong 20, Xinong 979, Zhengmai 366, Zhengmai 9023, Zhengmai 7698, Zhou 18, Zhou 36, Zhou 22, Xinong 511, Gaozhuan 6, Hengguan 35,Taimai 198, Tengle 176, AK 58, Baihan 207,Dingyan 161, Fumai 2008, Wanmai 19 were used as subjects. All the wheat grains were provided by Xinxiang Nongle Seed Industry Co., Ltd., China. In order to ensure the quality of the test raw materials,the collected wheat grains were sealed in sample bags according to different varieties and stored in a dry and dark conditions for reserve.

2.2 Instruments and equipment

NIR system (Isuzu Optics Corp, Taiwan, China)in this study was composed of a spectrograph, a halogen illumination source, a lens, a highborosilicate glass plate (Diameter, 60 mm, Height,100 mm) and a computer equipped with a spectral acquisition software. A drying oven (DHG-9240A,Shanghai Yiheng Scientific Instrument Ltd.),weighing bottle (70 mm×35 mm), electronic scales(Mettler Toledo Instrument Ltd., China , accurate to 0.01) were also used.

2.3 Methods

2.3.1 Spectral data collection and preprocessing An appropriate amount of wheat seeds of different varieties were selected from each sample bag and evenly laid on a transparent glass culture dish with a thickness of 10 cm until it was flush with the edge of the glass plate, and then scanned by the NIR device 5 times. The exposure time was 0.63 s. The spectral information in the range of 900~1700 nm were collected and averaged.

Generally, it is necessary to conduct spectral preprocessing as the spectral information is easily affected by noise and external environment during the data collection[20-21]. In this study, Gaussian filtering smoothing (GFS), normalization (N), and baseline correction (BC) were respectively applied to preprocess the collected raw spectra. GFS is always used to eliminate the influence of dense noise points on spectral information[22]. Normalization is adopted to eliminate the scattering influence to improve spectral signal-to-noise ratio[23]. BC can be used to correct the slope baseline[24].

2.3.2 Measurement of dry matter and weight The dry matter and weight of wheat grain were measured by drying method in that the sample was placed in a drying oven at 105 ℃ and measured to a constant weight[25-26].

2.3.3 Modeling between spectral data and reference values The spectral reflectance information were mined using partial least square (PLS) algorithm to related to the measured values of dry matter and weight of wheat grain, respectively. PLS is an effective and useful approach for multivariate data analysis, and is often suitable for modeling when the number of wavelengths is larger than that of reference values (e.g. dry matter & weight) and meanwhile there is high collinearity among the wavelengths and reference values[27-28]. The established PLS model was assessed by correlation coefficient of calibration (rC),cross-validation (rCV), prediction (rP) and root mean square error of calibration (RMSEC), cross-validation(RMSECV) and prediction (RMSEP). In addition, the value of residual predictive deviation (RPD) is also one of the important indicators to measure the predictive ability of the model. When 1.5<RPD<2, the model can monitor the detection indicators, but the prediction performance is not good. When 2<RPD<2.5, the prediction performance of the model is good,and the actual detection of indicators is feasible[29]. In general, a PLS model with good performance should has high coefficient of calibration (r), low root mean square error (RMSE)[30]and high residual predictive deviation (RPD). A low absolute difference between RMSECV and RMSEP (ΔE) also indicates a good robustness of PLS model.

2.3.4 Optimal wavelength selection and model optimization It is necessary to conduct optimal wavelength selection to retain the wavelengths carrying the most useful information for PLS model optimization,as hundreds of spectra often have redundant information and that will increase the operation speed and reduce the efficiency for modeling. In addition,irrelevant spectral information existed in wavelengths will weaken the PLS model performance. In this study, regression coefficients (RC) and successive projections algorithm (SPA) were applied to select optimal wavelengths[31-32].

2.4 Statistical analysis

The PLS model, the MLR model, and the RC method were implemented using software Unscrambler 9.7 (CAMO, Oslo, Norway). The SPA process was implemented by Matlab R2010b software (The Mathworks, Inc., Natick, MA, USA).

3 Results and discussion

3.1 Reference values of dry matter and weight

The reference values of dry matter (145 samples) and weight (130 samples) were measured and shown in Table 1. For the 145 samples for dry matter,the reference values were arranged from smallest to largest, and every one of three sample was taken out as prediction set sample and the remaining 2/3 samples were used as calibration, resulting in 97 samples model calibration and 48 samples for model prediction. Similarly, for weight, among the 130 samples, 87 samples were used for model calibration and the remaining 43 samples for prediction.

Table 1 Reference values of dry matter and weight in wheat grain

3.2 Spectral profiles of wheat grain

The collected NIR spectral reflectance features of the wheat grain samples for dry matter and weightin 900~1700 nm region were shown in Fig.1 and Fig.2, respectively. Fig.1a and Fig.2a, Fig.1b and Fig.2b, Fig.1c and Fig.2c respectively showed the three different pretreatment spectra (GFS, N, BC) of wheat grain samples, while Fig.1d and Fig.2d showed raw spectra of wheat grain samples. In the four plots,145 spectra had the same trends, and several apparent absorption peaks emerged in the range of 900~1700 nm. It was observed that there are two strong absorption peaks at about 980 nm and 1200 nm,which is attributed to the O-H stretching and C-H stretching, respectively[33]. Another weak absorption peak at about 1450 nm was also appeared and it is related to O-H stretching[34]. The difference of hydrogen-containing groups in different wheat varieties and total weight directly affect the vibration absorption in the tested sample, which lead to different spectra. It is possible to mine the spectral information using appropriate chemometric algorithms to predict the dry matter and weight of wheat grain.

Fig.1 NIR reflectance characteristics of wheat grain samples for dry matter

Fig.2 NIR reflectance characteristics of wheat grain samples for weight

3.3 PLS modeling based on full wavelength

By running PLS algorithm using a matrix of 145 samples×400 wavelengths with 97 samples×400 wavelengths used for calibration and 48 samples×400 wavelengths used for prediction in software Unscrambler 9.7 (CAMO, Oslo, Norway), four PLS models(GFS-PLS, N-PLS, BC-PLS, RAW-PLS) were constructed based on RAW, GFS, N and BC spectra,for predicting dry matter and weight of wheat grain respectively. The performance of the four PLS models are shown in Table 2.

As is indicated, the four PLS models for dry matter showed similar good performance, withrof over 0.90 and RMSE of less than 0.03%. Besides, all the four PLS models had same ΔEvalues and RPD values(ΔE=0.01, RPD=2.33), indicating the same good robustness in predicting dry matter of wheat grain. Similarly, the four PLS models for weight showed similar performance and robustness (Rp=0.73±0.01 g, RMSEP=0.50±0.10 g, ΔE=0.05±0.10 g).And the RPD values of PLS model constructed by four spectrum (GFS spectra, N spectra, BC spectra and raw spectra) were 1.49, 1.63, 1.43 and 1.60,respectively. The model performance needed further evaluation due to small difference in the values of RPD. He et al. (2019)[35]determined the dry matter content of wheat. Compared with the existing results,the root mean square error and ΔEvalue of the fullband PLS model in this study were smaller, indicating that the accuracy and robustness of the model are improved due to the number increase of variety and quantity of samples. And the value of RPD was supplement in this study for better evaluating the prediction model.

3.4 Selection of optimal wavelengths

Two methods such as RC and SPA were applied for the optimal wavelength selection for dry matter prediction and the results are shown in Table 3. The minimum number of optimal wavelengths was 5 and the maximum was 25. The wavelengths selected using the two methods were all reduced by over 90%, with the minimum of 94% and maximum of 99%.Compared with the research results of He et al.(2019)[35], the wavelength reduction was larger, which may be related to the fact that the increase of sample size was more conducive to the exposure of characteristic wavelength. By and large, the optimal wavelength number in RC method was greater than that in SPA method, while the amount of decreased wavelength was larger in SPA method than in RC method.

Similarly, the optimal wavelengths selected for weight prediction and the results are shown in Table 4.In specific, the optimal wavelength number was between 8 and 27, and the wavelength reduction reached 95%~98%. The number of optimal wavelengths selected by RC method was larger than that of SPA method, while the wavelength reduction of RC method was less than that of SPA method, for either RAW spectra or preprocessed spectra. This may be related to the difference in the process of screening characteristic wavelengths between SPA method and RC method. The SPA method is a wavelength screening method that finding the wavelength that reflects the maximum information of the sample with the least collinear information by using the iterative iteration method, while the RC method is a wavelengthscreening method which directly obtains the characteristic wavelengths highly correlated with the content of the measured component.

Table 2 Dry matter and weight prediction by PLS models based on full NIR wavelength

Table 3 Results of optimal wavelengths selected by RC and SPA methods for dry matter prediction

Table 4 Results of optimal wavelengths selected by RC and SPA methods for weight prediction

3.5 Modeling by PLS based on optimal wavelengths

With the selected optimal wavelengths, PLS models using full wavelength were optimized and the results are shown in Table 5. For dry matter prediction, all the eight optimized PLS models had different capabilities. Compared with the full-band model, the correlation coefficient and root mean square error of the model fluctuate in a reasonable range, which may be due to the selection of only characteristic wavelengths with a higher contribution rate to spectral information. Among them, the RC-RAW-PLS model and the RC-GFS-PLS model established using optimal wavelengths selected respectively from RAW spectra (20 wavelengths) and GFS spectra (24 wavelengths) by RC method showed best performance,with bothrPof 0.93 and RMSEP of 0.03%, as well as RPD of 2.33. However, the wavelength number used in RC-RAW-PLS model was less than that of RCGFS-PLS model (20 vs. 24). By comparison, the RCRAW-PLS model performed better in predicting dry matter of wheat grain. Besides, compared with RAWPLS model, the RC-RAW-PLS model kept almost the same ability in predicting dry matter, indicating the effectiveness and contribution of the 20 optimal wavelengths in PLS model optimization.

Table 5 Dry matter and weight prediction by PLS models based on optimal wavelengths

For weight, the eight established optimized PLS models also had different performance in prediction.The SPA screening method was better than the RC method in terms of wavelength reduction, and the results of wavelength screening method of SPA were better than RC method in improving the values of RPD andΔE. And it was observed that the SPA-NPLS model built with eight optimal wavelengthsselected from N spectra by SPA method had better performance, among the eight optimized PLS models,withrPof 0.84 and RMSEP of 0.36 g. The SPA-NPLS model had the least wavelength number (8) but good robustness in predicting weight of wheat grain.Compared with N-PLS model using full N spectra,the ability of SPA-N-PLS model in predicting weight of wheat grain was improved, withrPand RPD increased while RMSEP andΔEdecreased (RPD=1.86,ΔE=0.00). As a whole, N spectra could be mined by PLS algorithm for predicting weight of wheat grain.

3.6 MLR modeling based on optimal wavelengths

Besides of PLS algorithm, MLR can also be used for model construction when the wavelength number is less than that of samples. In this study, based on the selected optimal wavelengths, MLR models for dry matter and weight prediction were established and the results are shown in Table 6.

For dry matter, the eight MLR models showed different performance in prediction and the RCRAW-MLR models performed best withrPof 0.91,RMSEP of 0.03% and RPD of 2.33, similar to the RC-RAW-PLS model. For weight prediction, the SPA-RAW-MLR model among the eight MLR models showed best performance, withrPof 0.89,RMSEP of 0.32 g and RPD of 2.09, which was better than the SPA-N-PLS model. In other words, MLR model with 12 optimal wavelengths selected from RAW spectra was more suitable for wheat weight prediction.

3.7 Comparison of parameters of PLS and MLR models for dry matter and volume weight

In conclusion, the best PLS and MLR models for predicting dry matter content in wheat grains were RC-RAW-PLS and RC-RAW-MLR, respectively.The best PLS and MLR prediction models for weight were SPA-N-PLS and SPA-RAW-MLR, respectively.In order to further select the best quantitative prediction model for dry matter and weight contents in wheat grains, it was necessary to compare and select the best PLS and MLR models, as shown in Table 7.

Table 6 Dry matter and weight prediction by MLR models based on optimal wavelengths

Table 7 Comparison of optimal prediction results of dry matter and weight value of wheat grain by PLS and MLR mode

For dry matter, the best modeling spectrum was the original spectrum, and the optimal wavelength screening for PLS and MLR models was RC with the number of wavelengths of 20. As can be seen from Table 7, the RC-RAW-PLS model for dry matter showed higher correlation and lower error than the RC-RAW-MLR model. For the weight, the best spectra of PLS and MLR models were N and RAW spectra, respectively. The best characteristic wavelength screening method of N and RAW spectra was SPA, and the number of wavelengths was reduced by 98.00% and 97.00%, respectively. Although the number of wavelengths of SPA-RAW-MLR model was more than that of SPA-N-PLS model, the reduction of the number of wavelengths was still considerable, and the performance of the model was better.

4 Conclusions

In this study, the spectra of wheat grain in the range of 900~1700 nm was acquired and mined by PLS and MLR algorithm, to establish a rapid and nondestructive method for predicting wheat dry matter and weight. After three different spectral pretreatment (GFS, N, BC), four PLS models with full GFS, N, BC and RAW spectra were built and their performance were similar in predicting either dry matter or weight. Two methods including RC and SPA were applied for optimal wavelength selection to optimize PLS models, and four RC-PLS models (RCGFS-PLS, RC-N-PLS, RC-BC-PLS, RC-RAW-PLS)and SPA-PLS models (SPA-GFS-PLS, SPA-N-PLS,SPA-BC-PLS, SPA-RAW-PLS) were respectively constructed for dry matter and weight prediction.Likewise, four RC-MLR models such as RC-GFSMLR, RC-N-MLR, RC-BC-MLR and RC-RAWMLR and SPA-MLR models such as SPA-GFS-MLR,SPA-N-MLR, SPA-BC-MLR, SPA-RAW-MLR were also built for predicting wheat dry matter and weight.Among them, the RC-RAW-PLS model based on 20 optimal wavelengths selected from RAW spectra by RC method showed better performance in dry matter prediction (rP=0.93, RMSEP=0.03%, RPD=2.33),while the SPA-RAW-MLR model with 12 optimal wavelengths selected from RAW spectra by SPA method showed better performance in weight prediction (rP=0.89, RMSEP=0.32 g, RPD=2.09). It was concluded that NIR spectra could be mined by appropriate algorithm to predict dry matter and weight of wheat grain. In this study, the accuracy of the NIR prediction model for wheat grain dry matter and weight needs to be further improved. In the future, it can be improved and supplemented by enriching the variety and number of samples. Further improvement of the accuracy and stability of the model will be conducive to the later debugging and practical application of the instrument.

猜你喜欢

新乡籽粒河南
新乡医学院
籽粒苋的饲用价值和高产栽培技术
机收玉米籽粒破损率与农艺性状的关联分析
出彩河南
河南:过大年,逛庙会
立法为新乡教育事业“保驾护航”
机收玉米杂交组合的品种特性研究
玉米籽粒机械直收应注意的问题
为新乡教育均衡发展上一道“法律保险”
河南:走进就业的春天