APP下载

A novel NIRS modelling method with OPLS-SPA and MIX-PLS for timber evaluation

2022-02-26JinhaoChenHuiligYuDapengJiangYizhuoZhangKeqiWang

Journal of Forestry Research 2022年1期

Jinhao Chen · Huilig Yu,2 · Dapeng Jiang · Yizhuo Zhang · Keqi Wang

Abstract The identification of timber properties is important for safe application.Near Infrared Spectroscopy (NIRS) technology is widely-used because of its simplicity, effi-ciency, and positive environmental attributes.However, in its application, weak signals are extracted from complex, overlapping and changing information.This study focused on the stability of NIR modeling.The Orthogonal Partial Least Squares(OPLS) and Successive Projections Algorithm (SPA) eliminates noise and extracts effective spectra, and an ensemble learning method MIX-PLS, is applied to establish the model.The elastic modulus of timber is taken as an example, and 201 wood samples of three species, Xylosmacongesta (Lour.) Merr., Acer pictum subsp.mono, and Betula pendula, samples were divided into three groups to investigate modelling performance.The results show that OPLS can preprocess the near-infrared spectroscopy information according to the target object in the face of the system error and reduce errors to minimum.SPA finally selects 13 spectral bands, simplifies the NIR spectral data and improves model accuracy.The Pearson’s correlation coefficient of Calibration (Rc) and the Pearson’s correlation coefficient of Prediction (Rp) of Mix Partial Least Squares (MIX-PLS) were 0.95 and 0.90, and Root Mean Square Error of Calibration (RMSEC) and Root Mean Square Error of Prediction (RMSEP) are 2.075 and 6.001, respectively, which shows the model has good generalization abilities.

Keywords NIR prediction · Orthogonal partial least squares (OPLS) · Successive projections algorithm (SPA) · Mix partial least squares (MIX-PLS) modulus of elasticity

Introduction

Wood is a major material of buildings, furniture products, composite materials, wood-based panels and other products.Accurate, rapid and non-destructive measurement of its mechanical properties is a basic requirement of many forest processing and grading industries.Near-infrared spectroscopy (NIRS) is considered to be one of the most beneficial non-destructive techniques for wood analysis and has many unique advantages (Li et al.2018).Compared with X-ray technology, NIRS technology is not harmful and ensures the safety of detection personnel (Jacquin et al.2017).Related to ultrasonic waves, it does not require a coupling agent in the detection process and is easy to carry out in the field (Cavalcanti et al.2018).Further, in comparison with nuclear magnetic resonance techniques, NIRS instrumentation is small and easy to carry (Zupanc et al.2019), and compared with mechanical stress waves, easy to operate.The accuracy of measurements is high (Feng et al.2016).

In the application of near-infrared spectral analysis, Weak signals should be extracted for modeling from normally possess broad overlapping NIR absorption bands (Magalhães et al.2018).The core technologies include preprocessing, wavelength extraction and identification model establishment.Preprocess technology can solve the baseline drift caused by solid particle scattering (Fitamo et al.2017; Rivard and Sánchez-Azofeita 2018).(Wang et al.2020) monitored the nutritional status of plants by exploring the relationship between hyperspectral imaging and phosphorus and potassium.The performance of PLS calibration models based on standard normal variables (SNV), multiplicative scattering correction (MSC), frist derivative (1-Der) and second derivation (2-Der) are compared.The results show that the PLS model with SNV preprocessed has the highest accuracy.(Nascimento et al.2016) established the near-infrared spectral calibration model to predict external parameters such as maturity, hardness and color of ’Aurora-1′ peach.Four preprocessing methods standard normal variable (SNV), multiplicative scatter correction (MSC), standard normal variable (SNV) + de trend processing and SG convolution smoothing are compared.The experimental results show that the PLS model based on standard normal variable (SNV) and the De trend method has the best performance.Although these preprocess methods filter out certain noises and suppress baseline drift, they face a common problem, which lacks a relatively objective reference frame to evaluate and separate the characteristic and baseline drift components in the spectrum.Therefore, With environment changes caused by the light source of the spectrometer and the baseline drift caused by the scattered light of solid particles, the above preprocess methods are not effective.

The multi-collinearity of the original spectrum without dimension reduction is serious, and therefore it is necessary to select an appropriate characteristic selection method to extract the original spectral band (Houby et al.2017; Yu et al.2019; Zhang et al.2019b).Wavelength extraction technology includes UVE, SPA, PCA and other optimization algorithms.(Huang et al.2014) proposed using an ant colony algorithm combined with interval partial least squares (iPLS) to optimize the characteristic sub interval corresponding to anthocyanin content in spectral data ofCamellia oleiferaC.Abel., on which a calibration model of near-infrared spectroscopy was established.(Zhang et al.2019a) improved and optimized the nearinfrared spectrum of silage corn raw materials using a discrete particle swarm optimization model, and established a nearinfrared spectral model for moisture content detection.The common mechanism of random optimization algorithms, such as ant colony algorithm, particle swarm optimization algorithm, and GA algorithm, is to select the spectral variable to meet the minimum ftiness function.This ftiness function often selects model evaluation indexes such as correlation coeffi-cients and only optimizes the final model result but ignores the optimization of spectral characteristic number.Therefore, a redundancy problem often occurs in the optimized spectrum subset.

The prediction method is the core part of near-infrared spectroscopy modeling and analysis, and preprocessing and characteristic selection exist for the purpose of establishing a more accurate model.(Xian et al.2016) used interval partial least squares (iPLS), synergy interval partial least squares (SiPLS) and backward interval partial least squares (BiPLS) to establish a prediction model for the doping amount of virgin olive oil based on different content and type of fried oil.The results showed that the correlation coefficient and root mean square error of prediction between the SiPLS and BiPLS models are lower than those of the iPLS model.Using PLS, iPLS and GA-PLS models, (De Assis et al.2018) established the pH, high titratable acidity and low soluble solid content of Dover fruit.The results showed that the PLS model is the most accurate for predicting the contents of low soluble solids and the GA-PLS the most accurate for predicting pH and high titratable acidity.

Although PLS, iPLS, BiPLS calibration models can carry out small sample modeling, the generalization ability of the models is not strong (Sun et al.2016).OPLS correction technology is robust in complex external environments, which can suppress baseline drift, random noise and background changes in the original spectral matrix, and extract and fliter the noise in the original spectral matrix (Yin et al.2015).SPA algorithm eliminates collinearity by retaining the corresponding sub vector of maximum projection (Yuan et al.2016; Zhu et al.2017; Mesquita et al.2018; Krepper et al.2018).Compared with the ant colony algorithm, particle swarm optimization algorithm, GA algorithm and other random optimization algorithms, the SPA has faster speed and it can minimize redundancy problems (Yuan et al.2016); MIX-PLS is a multi-model integrated learning method which can improve the generalization ability by building multiple simple models for accurate modelling (Souza and Araújo 2014).

In this study, the elastic modulus ofXylosma-congesta(Lour.) Merr.,Acer pictumsubsp.mono andBetula pendulawere determined.NIRQuest512 spectrometer was used to collect the original spectral data of the samples.The OPLSSG composite preprocess method was used to process the data and the SPA method used to extract the characteristic bands.The MIX-PLS model of wood elastic modulus was established.

Materials and methods

Materials

Three wood species were selected:Xylosma-congesta(Lour.) Merr.,Acer pictumsubsp.Mono,andBetulapendula.The wood was from the Qingshuihe Forest Farm in Yichun City, Heilongjiang Province, at 128°01′ E, 42°30′ N, and an altitude of 600-700 m.Each material was numbered and measured for modulus of elasticity in static bending according to the "Method for Determination of the Modulus of Elasticity in Static Bending of Wood" (GB 1936.2-2009).

The spectrometer was equipped with an optical fiber probe covering wavelengths 900-1700 nm, with a 3.0 nm resolution.SPEC View 7.1 collected and recorded the spectra.The laboratory temperature was maintained at 22 ± 2 °C and relative humidity at 50%.A ring gasket was installed outside the optical fiber probe, and the distance between the probe and the sample was kept at 2 mm.Eight groups of near-infrared spectral data were collected by moving the spectrometer probe on the surface of the sample at a uniform speed.The schematic of spectrum acquisition is shown in Fig.1.

Fig.1 Schematic of spectrum acquisition

The near-infrared spectrometer was used to collect 210 experimental materials.In order to compare the stability of the spectral preprocess methods as a result of changes in the external environment, three spectral data groups were collected and set as A, B, C.The near-infrared spectrometer was reset for each measurement and three time points randomly selected and measured by different personnel.

The Emperor told him he had more than he knew what to do with, and that a new one had been captured that very night for trying to steal his magic bird, but that as he had already more than enough to feed and support, he was going to have this last captive hanged next morning

Preprocessing based on OPLS

In the process of preprocessing with OPLS, the spectral matrix and the elastic modulus of wood were assumed to be X and y, respectively and can be expressed in Eqs.1 and 2:

where, E and f are the residual matrix, U the data score matrix of y , C the prediction component weight matrix of y , T the prediction score matrix, and Tothe orthogonal fractional matrix.

In OPLS processing, the orthogonal variables to the elastic modulus y are eliminated, i.e., XP=is the product of the orthogonal component score matrix Toand the orthogonal component load matrix; Secondly, a partial least squares analysis was performed on the XPand elastic modulus y to obtain XP=TWT+E in which the product TWTof the predicted component score matrix T and the predicted component load matrix WTare the final output Xopls.The OPLS algorithm corrects the original NIR spectral matrix X according to the elastic modulus y , and removes the orthogonal part which is independent of y in the original spectral matrix X and outputs Xopls.

The study found that the S-G convolution smoothing had a good effect on spectral smoothing after OPLS correction (Eq.3):

where,xis the absorbance,λ the wavelength,iandjare serial numbers in the range of wavelength points,Δλ the wavelength interval,k!the factorial of the derivative order,akis weight coefficient.

Spectral selection based on SPA

When SPA was applied to the preprocessed spectral matrixXopls, the largestiandjof projection were calculated, wherexiandxjwere the two sub bands of the preprocessed spectral matrixXopls.Theiwas then recorded in the selected wavelength dictionary and set as xj=Pxj.The largestiwhich maximizes the projection Pxj=was calculated and theirecorded in the wavelength dictionary.When the number of wavelengths in the dictionary reached a predetermined value, the program was terminated.

NIR nonlinear modeling based on MIX-PLS

The MIX-PLS model is derived from the multi expert model.The real probability distribution is obtained by superposition of output values of each PLS model (Fig.2).Among them, the subsystem is a simple PLS model, f(x(i)|θ) is the probability distribution of the output vector of the PLS sub model.The number of subsystems isp, and the gate function is softmax function.

Fig.2 MIX-PLS block flowchart

Equation 4 is the total probability formula, the given vector Z controls the output proportion of the subsystem, when the subsystem is determined as thepPLS subsystem, the final output of the model is p(y(i)|zp(i),x(i),ε).p(y(i)|zp(i),x(i),ε) is the probability distribution of PLS output under certain conditions, ε is the set of all PLS subsystem parameter, i.e.,ε={θ1,w1...θp,wp}.p(zp(i)=1|x(i),V) is the output probability distribution of the gate function, and the term by term product of the two is the solution of p(y,Z|X,ϑ).]

The p(y(i)|zp(i),x(i),ε) obeys Gaussian distribution N(y(i)|fp(x(i),θp),wp) y(i) and x(i) is the mechanical property characteristics and spectrum samples of the group i , θpand wpare the parameter matrix of the MIX-PLS algorithm.The analytical solutions of θpand wpare given by Eqs.6 and 7:

where, Γp=diag(γp(1),γp(2),...,γp(k)) is diagonal matrix, theith element of diagonal matrix γp(i) is the expectation of the implicit variable zp(i) of MIX-PLS on p(Z|y,X,ϑ).

p(zp(i)=1|x(i),Vold) is the output probability distribution of the gate function which controls the opening and closing of each subsystem, balances the output of each subsystem and determines the final output.The number of subsystempis the parameter of MIX-PLS model which needs to be determined.The probability distribution obeys softmax regression, Voldis the weight of the softmax regression,vland vpare vectors in the weight matrix Vold.The distribution can be expressed by Eq.8:

Results and discussion

NIR preprocessing based on OPLS

The number of original spectral data bands collected by NIRQuest512 spectrometer is 512 (Fig.3) and preprocessed spectral data are shown in Fig.5.Among them, the OPLS method cycle number was set at 50, the S-G convolution smoothing window at 9, and the approximate polynomial order was Fig.3, the original near-infrared spectra of the three species of wood.From the 1650 band, the data fluctuated with a lot of noise.Due to error factors such as nearinfrared spectrum drift, the absorption peaks of some spectral curves at 1200 band are not obvious.Figure 4 is the near infrared spectrum processed by the OPLS-SG algorithm.Noise in the 1650-1700 band has almost disappeared and the absorption peak near the 1200 band becomes clear.

Fig.3 Near-infrared spectra of Xylosma-congesta (Lour.) Merr., Acer pictum subsp.mono and Betula pendula species

To verify the superiority of OPLS, three methods, OPLSSG、SNV-SG and OSC-SG, were used to model the elastic modulus PLS to the three data groups A, B and C of full spectrum data after spectral normalization.After establishing the calibration model of elastic modulus by selecting one group from the three, the spectral data of the other two groups were input into the model for analysis.The three data groups were measured at different time points and by different personnel.The higher the evaluation indexes of the model, the stronger the robustness of the calibration model.

Model evaluation index table of different models in different spectral data groups are shown in Table 1.By comparing the model evaluation indexes Rcand RMSEC, it can be seen that the spectral matrix model evaluation results processed by OPLS are the highest, and the model is more stable.The OPLS method can identify and separate the spectral matrix of wood samples orthogonal to the mechanical properties of wood, suppress the spectral fluctuation caused by external disturbances, baseline drift and signal noise caused by solid sample scattering, and ensure the robustness of the model.

Fig.4 Preprocess diagram of the three wood samples

Results

SPA-MIX-PLS spectral calibration model

A SPA algorithm was used to select the characteristic band and the maximum number of components of the model was 20.The best number of subsystems of MIX-PLS was optimized by a fivefold search cross validation, and the optimal number of bands obtained was 13.MIX-PLS optimization of a number of subsystems as shown in Table 2, and it was determined to be four by correction set.The correlation coefficient RC was 0.95 and the root mean square error (RMSEC) was 2.075.The projection proportion of each SPA band the proportion of band importance is shown in Fig.5, and the screening results of SPA spectral bands was shown in Fig.6.In this study, 13 bands with the largest weight were selected from 512 bands at 900-1700.

Fig.5 Proportion of spectral band interval importance

Fig.6 SPA selecting the best band range

Table 1 Influence of preprocessing methods on modelling

Table 2 MIX-PLS optimization of number of subsystems

MIX-PLS model of wood mechanical properties

This was established based on the data of training set processed by the SPA algorithm, and the model evaluated onthe prediction set.Figures 7 and 8 show the verification and prediction results of the calibration model.

Fig.7 Validation of the MIX-PLS model

Fig.8 Predictions of the MIX-PLS model

In order to verify the effectiveness of the SPA-MIXPLS model, PLS, iPLS, BiPLS and PCR modeling methods were used for comparison.Correlation coefficient RC, RMSEC, predictive correlation coefficient (Rp) and predicted root mean square error (RMSEP) were selected as evaluation indexes.The results of the established models were compared and analyzed.The relevant parameters are shown in Table 3.

The accuracy of the iPLS, BiPLS and MIX-PLS models improved after SPA algorithm was applied (Table 3).Although the performance of the SPA-MIX-PLS model was not the best in the calibration set, it had strong generalization abilities, and prediction accuracy.

Table 3 Comparison of calibration model results

Conclusions

In this study, the prediction of the elastic modulus of wood was determined, near infrared spectroscopy as the detection means, OPLS and SG methods selected to preprocess the spectrum, the SPA method used to optimize the characteristic spectrum, the MIX-PLS used to establish the model, and three wood materials ofXylosma-congesta(Lour.) Merr., Acer pictum subsp.Mono,andBetula pendulaspecies? were selected to verify the effectiveness of the method.The results show that OPLS correction can preprocess the near-infrared spectrum according to the target object, improve the quality of spectral matrix, and effectively simplify the subsequent model data processing, SPA, as a classical algorithm, can quickly extract the characteristic spectral bands and improve the accuracy of the prediction model.The correlation coefficients RC and RP of the MIX-PLS calibration model were 0.95 and 0.90.Root mean square error (RMSEC) and RMSEP were 2.075 and 6.001, respectively.The comparison of PLS, iPLS, BiPLS, PCR and MIX-PLS showed that the MIX-PLS calibration model had the best prediction performance and the strongest generalization ability.