

农业工程学报 2016年3期

王丽爱,周旭东,朱新开,郭文善※(.扬州大学江苏省作物遗传生理重点实验室,扬州 5009; . 扬州大学信息工程学院,扬州 57)


摘要:为给小麦长势的遥感监测提供技术支持,该文运用随机森林回归(RF,random forest)算法建立小麦叶面积指数(LAI)遥感反演模型。首先基于2010-2013年江苏地区小麦环境减灾卫星HJ-CCD的影像数据,提取拔节、孕穗和开花3个生育期的卫星植被指数,进而根据各生育期植被指数和相应实测LAI数据,利用RF算法构建各期小麦LAI反演模型,并以人工神经网络(ANN,artificial neural network)模型为参比模型进行预测精度的比较。结果表明:RF算法模型在3个生育期的预测结果均好于同期的ANN模型。拔节、孕穗和开花3个生育期RF模型预测值与地面实测值的R2分别为0.79,0.67和0.59,对应的RMSE分别为0.57,0.90和0.78;ANN模型的R2分别为0.67,0.31和0.30,对应的RMSE分别为0.82,1.94和1.43。该研究结果为提高大田尺度下的小麦LAI遥感预测精度提供了技术和方法。


0 引 言

叶面积指数(LAI,leaf area index)能够反映植被长势个体特征和群体特征,是农作物长势监测的一个关键生态参数[1]。近年来,随着遥感技术在农业领域的应用,众多学者已对遥感反演农作物LAI展开深入研究[2-5]。其中基于植被指数反演LAI是非常重要的研究方向[5-10]。Tavakoli等[5]研究表明基于RGB的一些指数与小麦LAI有很好的相关性,可以使用数码相机估测作物LAI。赵娟等[6]研究表明由ASD光谱仪数据所提取的RVI(ratio vegetation index)适于反演所研究地区冬小麦生长中期(拔节到抽穗前)的LAI,NDVI(normalized difference vegetation index)适于反演生长后期(抽穗到成熟期)的LAI。植被指数可基于不同的遥感数据源提取,前人对基于卫星数据的植被指数遥感反演LAI也进行了研究:何亚娟等[7]利用SPOT数据,构建了基于NDVI的二次函数模型反演甘蔗全生育期的LAI;Liu等[8]分别提取小麦、玉米和大豆的4种Landsat5/7卫星植被指数,对比研究这些指数反演各作物LAI的精度,指出反演能力最好的指数是EVI(enhanced vegetation index);郭琳等[9]基于中国自主研发的环境减灾卫星HJ-CCD数据,通过支持向量机方法建立NDVI指数与LAI的关系反演甘蔗LAI值;陈雪洋等[10]比较了4种HJ-CCD植被指数与冬小麦LAI的关系,确定反演LAI的最优指数为RVI。已有研究多基于单个植被指数反演作物LAI,而单一植被指数存在不同程度的饱和性,且每种指数只能包含部分波段的信息。为此,鉴于人工神经网络(ANN,artificial neural network)算法能同时利用多个植被指数,并能很好地拟合非线性问题,近年来,该算法已被广泛应用于构建农学参数遥感反演模型[11-13]。虽然ANN模型有一定的预测精度,但其模型参数过多,构建模型复杂。

类似于ANN,新兴的随机森林(RF,random forest)也是一种多因子机器学习算法,可以利用多个植被指数。作为目前最精确预测方法之一,RF已广泛应用于遥感领域的分类问题[14-16],取得了优于ANN的性能,并且模型构建过程比ANN简单。但迄今为止,仅有少量文献报道该算法在遥感监测预报方面的应用[17-18],尤其据我们所知,尚无基于RF算法遥感反演小麦LAI的相关研究。鉴于上述,本文首次使用RF算法并结合多个植被指数构建小麦LAI遥感反演多因子模型,旨在为提高大田尺度下遥感定量反演小麦LAI的精度提供新技术。

结合长江中下游地区小麦栽培实际,本文基于2010 -2013年际间田间试验数据和HJ-CCD影像数据,提取拔节、孕穗和开花3生育期的小麦实测LAI和相应时期的15个遥感植被指数;进而以小麦LAI值为因变量,以植被指数为自变量,利用RF构建3个生育期各自的LAI值遥感反演模型。在试验中,将各期模型反演的LAI值与地面实测LAI值进行拟合,采用决定系数(R2)和均方根误差(RMSE)进行精度检验,并与ANN模型进行精度比较。

1 研究区及数据采集



表1 本研究试验区Table 1 Test regions in this study


试验区栽培的小麦品种为扬麦13号、扬麦15号、扬麦16号和扬辐麦2号。取样时期分别为小麦拔节、孕穗和开花期。在每县设置有代表性的样点15~20个,每个样点设定取样面积为50 cm×4行(行距15~20 cm),于小麦的各生育期取长势均匀的植株15株密封带回实验室用比叶重法测定LAI。同时采用美国Trimble公司生产的Juno ST 手持式GPS进行定位,获取每个采样点的经度值和纬度值。从中国资源卫星应用中心网站下载分别与小麦拔节、孕穗和开花期准同步的HJ-CCD 影像数据。

将各生育期4 a的数据集合起来,随机分成2部分(75%和25%),75%部分作为训练样本建立模型,25%部分作为测试样本评价模型。拔节、孕穗和开花期训练样本数分别为174、174和147个;3期的测试样本数则分别为58、58和49个。


本文使用的遥感数据来自于中国自主研制的环境和灾害监测预报小卫星系统,它包括HJ-A和HJ-B 2颗卫星,每颗星都装载了空间分辨率为30 m的CCD (charge-coupled device)相机,包括4个波段:蓝光B1(430~520 nm),绿光B2(520~600 nm),红光B3(630~690 nm)和近红外光B4(760~900 nm)。

所有影像都经过辐射定标、大气校正和几何校正。辐射定标是将所有影像通过利用HJ星CCD相机的辐射定标参数从DN值转化为辐亮度图像;大气校正是运用ENVI4.7软件的FLAASH模块进行;几何校正是先参照江苏地区1∶100 000地形图进行影像粗校正,再进一步利用地面实测的GPS控制点精校正,使影像的精度能够小于1个像元。

2 研究方法


农作物LAI值与植被光谱的可见/近红外波段存在很强的相关性[19]。在敏感反映LAI的同时,为了削弱环境因素的干扰,可利用这些特征波段构建的植被指数估测LAI。本研究基于HJ-CCD相机的4个波段构建了15个已得到广泛认可且能较好地反演LAI[3, 20-21]的遥感植被指数(表2)。

表2 遥感植被指数计算公式Table 2 Formulas of remote sensing vegetation index





3 结果与分析


分别利用小麦拔节、孕穗和开花期的训练集,使用RF和ANN算法构建小麦LAI遥感反演模型。在各生育期的每个模型中,表2中的15个植被指数作为自变量,小麦LAI作为因变量。在RF模型构建中,首先将此算法编制成计算机程序,然后分别确定回归树数目ntree及分割节点所需变量数目mtry的取值,最后运行该程序进行建模,得到的模型本身没有明确的数学公式。根据经验及多次试验,确定3个生育期RF算法的ntree均为2000,mtry均为3。RF模型基于袋外数据集(OOB,out-of-bag data),显示了模型中15个植被指数的重要性(图1),有助于帮助理解每个指数对模型的影响力,植被指数对应的RMSE数值越大表明此指数越重要。由图1所示,拔节期除EVI的其他14个指数对应的RMSE均在0.4左右,表明这14个指数对LAI具有相似的影响力;孕穗期NRI 和MTVI2指数对应的RMSE明显高于其他13个指数的RMSE,表明它们对LAI均具有较强的影响力;开花期NRI 和NLI指数较其他13个指数对LAI的影响力较弱。

图1 RF模型中估计LAI的植被指数重要性Fig.1 Importance of vegetation indices in RF models for estimating LAI



图2 小麦LAI实测值与模型预测值关系图Fig.2 Relational graph of measured and predicted wheat LAI





这2种机器学习算法本身都有自身的参数,ANN需要设定多个参数(网络结构、结点个数、训练函数、学习函数、学习率等),RF算法只需要设定2个参数(ntree 和mtry),显然增加了应用RF的便利性。


4 结 论






Inverting wheat leaf area index based on HJ-CCD remote sensing data and random forest algorithm

Wang Liai1, Zhou Xudong2, Zhu Xinkai1, Guo Wenshan1※
(1. Key Laboratory of Crop Genetics and Physiology of Jiangsu Province, Yangzhou University, Yangzhou 225009, China; 2. Information Engineering College of Yangzhou University, Yangzhou 225127, China)

Abstract:The leaf area index (LAI) of crops is an important parameter for crop monitoring. With the remote sensing application in agriculture, inverting LAI of crops from remote sensing data has been studied. Among these studies, vegetation indices are widely used because they can reduce effect background noise on the spectral reflectance of plant canopies. In addition to using vegetation indices, modeling algorithm also plays an important role in improving the remote estimation accuracy of crop LAI. Recently, the emerging Random Forest (RF) machine-learning algorithm is regarded as one of the most precise prediction methods for regression. In this paper, we conducted studies on wheat LAI estimations utilizing RF algorithm and vegetation indices. Firstly based on China’s environmental satellite charge-coupled device (HJ-CCD) image data of wheat (Triticum aestivum) from test sites in Jiangsu province of China during 2010-2013, fifteen vegetation indices from previously reported results and related LAI were respectively calculated at the jointing, booting, and anthesis stages. Then, through utilizing RF algorithm, the LAI inverting model for each stage was respectively established based on its vegetation indices and corresponding in situ wheat LAI measured during the HJ-CCD data acquisition. For each stage, the pooled data from 2010-2013 were randomly divided into a training dataset and an independent model validation dataset (75% and 25% of the pooled data, respectively). For the training dataset, the number of samples was 174 at jointing, 174 at booting, and 147 at anthesis. For the validation dataset, the number of samples was 58 at jointing, 58 at booting, and 49 at anthesis. The training dataset was used to establish models to predict wheat LAI during each growth stage, and the validation dataset was employed to test the quality of each prediction model. The RF model of each stage for estimating wheat LAI was then established in which the 15 vegetation indices were considered to be the independent variables and wheat LAI was the dependent variable. Additionally for each stage, the model based on artificial neural network (ANN) machine-learning algorithm was employed as a reference model, which had been successfully used to invert LAI of crops in previous studies. In order to evaluate each model’s estimation accuracy and to further compare the performances of the two models for each stage, the coefficients of determination (R2) and the corresponding root mean square errors (RMSE) for the estimated-versus-measured LAI were calculated respectively on the basis of the corresponding validation data. The results indicated that RF outperformed ANN at each stage. For RF models, the R2for the estimated-versus-measured LAI values for the three stages were 0.79, 0.67, and 0.59, respectively, in contrast to 0.57, 0.90, and 0.78 from RMSE. For ANN models, the R2for the three stages was 0.67, 0.31, and 0.30, respectively, and the corresponding RMSE was 0.82, 1.94, and 1.43. Furthermore, RF showed the vegetation index of model that noticeably contributed to the LAI estimation for each stage (i.e., EVI at jointing, MTVI2 at booting, and MSR at anthesis). Thus, the RF algorithm provides an effective way to improve the prediction accuracy of LAI in wheat on a large scale.

Keywords:vegetation; neural networks; algorithms; random forest; machine-learning; leaf area index; wheat










