Soil property mapping by combining spatial distance information into the Soil Land Inference Model(SoLIM)
2021-10-15ChengzhiQINYimingANPengLIANGAxingZHUandLinYANG
Chengzhi QIN,Yiming AN,Peng LIANG,Axing ZHUand Lin YANG
2College of Resources and Environment,University of Chinese Academy of Sciences,Beijing 100049(China)
3Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application and School of Geography,Nanjing Normal University,Nanjing 210097(China)
4Department of Geography,University of Wisconsin-Madison,Madison WI 53706(USA)
5Key Laboratory of Virtual Geographic Environment,Ministry of Education,Nanjing Normal University,Nanjing 210023(China)
6School of Geographic and Oceanographic Sciences,Nanjing University,Nanjing 210093(China)
ABSTRACT The Soil Land Inference Model(SoLIM)was primarily proposed by Zhu et al.(Zhu A X,Band L,Vertessy R,Dutton B.1997.Derivation of soil properties using a soil land inference model(SoLIM).Soil Sci Soc Am J.61:523–533.)and was based on the Third Law of Geography.Based on the assumption that the soil property value at a location of interest will be more similar to that of a given soil sample when the environmental condition at the location of interest is more similar to that at the location from which the sample was taken,SoLIM estimates the soil property value of the location of interest using the soil property values of known samples weighted by the similarity between those samples and the location of interest in terms of an attribute domain of environmental conditions.However,the current SoLIM method ignores information about the spatial distances between the location of interest and those of the sample.In this study,we proposed a new method of soil property mapping,SoLIM-IDW,which incorporates spatial distance information into the SoLIM method by means of inverse distance weighting(IDW).The proposed method is based on the assumption that the soil property value at a location of interest will be more similar to that of a known sample both when the environmental conditions are more similar and when the distance between the location of interest and the sample location is shorter.Our evaluation experiments on A-horizon soil organic matter mapping in two study areas with independent evaluation samples showed that the proposed SoLIM-IDW method can obtain lower prediction errors than the original SoLIM method,multiple linear regression,geographically weighted regression,and regression-kriging with the same modeling points.Future work mainly includes the determination of optimal power parameter values and the appropriate setting of the parameter under different application contexts.
Key Words:digital soil mapping,location of soil sample,inverse distance weighting,soil organic matter,Third Law of Geography
INTRODUCTION
Digital soil mapping(or predictive soil mapping)is an efficient method of creating predictive soil maps by developing a numerical model of the relationships between environmental covariates and soil in an area and then applying that model to a geographical database.Although the numerical model of these relationships is often developed in a statistical or geostatistical manner(such as regression and kriging)(Brus and De Gruijter,1993;Fotheringhamet al.,1996;Goovaerts,1999;Heuvelink and Pebesma,1999;Henglet al.,2007;Lark,2012),such a method requires a large number of soil samples that sufficiently represent the soil-environment relationship across the study area(Zhuet al.,2018).This translates to a large budget requirement and highly labor-intensive field work.Additionally,such a method requires the represented soil-environment relationship to be stable across the entire study area(or,the stationarity assumption in geostatistics),which often cannot be met owing to the complexity of soil distribution(Zhuet al.,2018).
The Soil Land Inference Model(SoLIM)method was originally proposed by Zhuet al.(1997)to overcome these disadvantages of statistical or geostatistical models for digital soil mapping based on the Third Law of Geography(Zhuet al.,2018).The Third Law of Geography stipulates that the soil property value at a location of interest will be more similar to that of a given soil sample when the environmental condition at the location of interest is more similar to that at the location from which the sample was taken.Based on this SoLIM estimates the soil property value of the location of interest using the soil property values of known samples weighted by the similarity between the locations of the known samples and that of interest in terms of an attribute domain of environmental conditions characterized using digital terrain analysis(Wanget al.,2019)and remote sensing techniques(Zhuet al.,2010a).The SoLIM method has been successfully applied to predictive soil mapping using environmental covariates related to soil property distribution and a few purposive samples or evenad-hocsoil samples with very limited representativeness of the soil-environment relationship in a study area(Zhuet al.,2010b,2015;Qinet al.,2012).
whereViis the soil property value predicted for the location of interesti,Vkis the soil property value of locationk,Nis the number of soil samples,andSi,kis the similarity,in terms of environmental conditions,between locationsiandk,which is normally computed to be the minimum among the individual similarities in terms of individual environmental attributes related to the spatial variation of soil(Zhu and Band,1994).The estimation uncertainty at each location can also be provided by SoLIM(Zhuet al.,1997),which is useful for directing the subsequent application of the resultant soil property map(Liet al.,2016;Zhanget al.,2016).
However,information regarding the spatial distances between the location of interest and that of soil sample is not yet considered in the current SoLIM inference method.This spatial distance is often important in spatial analysis and mapping of geographical phenomena,and the wide acceptance of Tobler’s First Law of Geography highlights this point(Tobler,1970;De Smithet al.,2007).In this study,we explored potential improvements in soil properties prediction by combing the essence of the First Law of Geography with that of the Third Law of Geography through the SoLIM method.The specific question to be answered is:can the spatial distances between the location of interest and that of soil sample be used to improve the accuracy of the resultant soil property map when using SoLIM?
MATERIALS AND METHODS
Basic idea
In order to improve the current SoLIM method so that it considers the spatial distances between the location of interest and that of soil sample during predictive soil property mapping,we first made the following assumption:soil property value at a location of interest will be more similar to that of a known sample both when the environmental conditions are more similar and when the distance between the location of interest and that of the sample is shorter.This assumption is a simple and reasonable extension of the assumption used in the current SoLIM,taking into consideration of Tobler’s First Law of Geography.
One classic method of spatial estimation at unvisited locations from a set of value-known locations is the inverse distance weighting(IDW)method,which is based completely on Tobler’s First Law of Geography and has no additional requirements regarding either the spatial distribution or number of sample locations.The way of considering the spatial distance between the location of interest and that of sample in the IDW method could be combined with the current SoLIM inference.Such a combination can result in a revised SoLIM,SoLIM-IDW,which is based on the assumption proposed above and takes into consideration the spatial distances between the location of interest and that of soil sample,thus eliminating the problem of ignoring spatial distance information in the current SoLIM inference method.
Design of the SoLIM-IDW method
In accordance with the basic idea above,the SoLIMIDW method proposed in this study is designed to revise the current SoLIM inference as:
wheredi,kis the distance between the location of interestiand the soil sample locationk(k=1,···,N),Di,kis the weight function based ondi,k,andris the power parameter which is a positive real number selected according to the principle of IDW.AsSi,k(i.e.,the similarity in terms of environmental conditions betweeniandk)is the same as that used in the current SoLIM inference,the original SoLIM inference function(Eq.1)is a special case of Eq.2 whenr=0.
Study areas and data
The proposed method was evaluated by application to two study areas(Fig.1).The first is the Heshan Farm at a watershed scale(ca.60 km2),and the second is Xuancheng County at a regional scale(ca.5 900 km2).
Fig.1 Maps of the two study areas,the Heshan Farm(ca.60 km2)in Heilongjiang Province(a)and Xuancheng County(ca.5 900 km2)in Anhui Province(b)of China.
The Heshan case.The Heshan Farm in Heilongjiang Province of northeastern China has very low relief.This study area(ca.60 km2)has a total relief of about 100 m and an average slope gradient of about 2°.The parent materials are mainly silt loam loess and fluvial deposits in the valley.In the Heshan Farm,the main types of soils at the subgroup level based on the Chinese Soil Taxonomy System(Chinese Soil Taxonomy Research Group,2001)include Mollic Bori-Udic Cambosols,Typic Hapli-Udic Isohumosols,Typic Bori-Udic Cambosols,Lithic Udi-Orthic Primosols,Pachic Stagni-Udic Isohumosols,and Fibric Histic-Typic Haplic Stagnic Gleyosols(Zhuet al.,2010b).Soybeans and wheat are the main crops in this area,which has been cultivated for more than 40 years.
The proposed method was used for digital soil mapping of A-horizon soil organic matter(SOM)content in this area.Four environmental variables(i.e.,slope gradient,profile curvature,horizontal curvature,and topographic wetness index)were adopted and calculated based on a digital elevation model(DEM)with a resolution of 10 m,as was done in the application of SoLIM in the same area(Zhuet al.,2010b).
In this case,a total of 39 points were used as modeling samples,including 29 points from an integrative hierarchical stepwise sampling method(Yanget al.,2013)and 10 points from subjective sampling at the summit,steeper slope,and valley locations(Zhuet al.,2010b;Yanget al.,2013).Another 44 points from a regular sampling grid(1 100 m×740 m)were used as independent evaluation samples(Fig.1a).Contents of SOM were measured by the Walkley-Black wet oxidation method(Nelson and Sommers,1982).
The Xuancheng case.Xuancheng County in China’s Anhui Province is approximately 5 900 km2and has a total relief of about 1 058 m and an average slope gradient of about 4°.The northwestern part of this area has mainly low relief,whereas the other parts are largely mountainous.The parent materials in this area are complex,including Quaternary clay-silt-gravel,sandstone,shale,conglomerate,pyroclastic rocks,limestone,granite,and granodiorite(Yanget al.,2017).There are five main soil orders in this area,i.e.,semihydromorphic soils,primitive soils,anthropogenic soils,eluvial soils,and ferro-allitic soils(National Soil Survey Office,1992).
The proposed method was used for digital soil mapping of A-horizon soil organic carbon(SOC)content in this area.The environmental variables used in this case included slope gradient,profile curvature,horizontal curvature,topographic wetness index,annual average precipitation,annual average temperature,and parent material(Yanget al.,2017).All environmental variables had a resolution of 90 m,where topographic attributes were calculated based on the Shuttle Radar Topography Mission(SRTM)DEM.
In this case,59 points from a multi-grade representative sampling method(Yanget al.,2017)were used as modeling samples,and 58 other points from a regular sampling grid of 10 km×10 km were used as independent evaluation samples(Fig.1b).Contents of SOC were measured by the dichromate oxidation method(external heat applied)(Nelson and Sommers,1982;Zenget al.,2016).
Evaluation experiments
The evaluation was focused on the comparison between SoLIM and SoLIM-IDW.To facilitate comparison with SoLIM(i.e.,SoLIM-IDW withr=0),the SoLIM-IDW method proposed in this study was evaluated in two different ways across several differentrvalues(i.e.,0.25,0.5,0.75,1,1.5,2,2.5,and 3).The first evaluation utilized quantitative statistics of the prediction errors based on independent evaluation samples,including the root mean squared error(RMSE),mean error(ME),and mean absolute error(MAE).The second was a qualitative comparison between the map resulting from the SoLIM-IDW method with the lowest RMSE and the map resulting from SoLIM.Note that currently there is no theoretical justification for the highestrvalue.With a very highrvalue,only those points that are extremely close to the location of interest will influence the prediction.In this study,we set the highestrvalue under test to be 3,which proved to be large enough by the following experimental results.
Evaluation of Zhuet al.(2015)in the Heshan Farm has shown that the SoLIM method performed better than multiple linear regression(MLR)under different sample scenarios.In this study,the performance of MLR using the same modeling samples as those for SoLIM-IDW was also compared with that of SoLIM-IDW based on independent evaluation samples.There are some methods widely used for digital soil mapping,which consider both attribute distance of environmental covariates and spatial distances,such as geographically weighted regression(GWR)(Brunsdonet al.,1996;Fotheringhamet al.,1996)and regression-kriging(RK)(Odehet al.,1995;Henglet al.,2007).Although the number of modeling samples in the study area is not enough to fit the modeling of GWR and RK well(Henglet al.,2007),in this study GWR and RK were also compared with SoLIM-IDW based on the same modeling samples and the same independent evaluation samples.Analyses of MLR,GWR,and RK were conducted using packages in R.Furthermore,a rudimentary prediction that assumed the mean soil property value of modeling samples(i.e.,52.99 g kg-1for Heshan and 11.315 g kg-1for Xuancheng)to be the predicted value for each independent evaluation sample,or“MeanValuePredict”for short,was also compared with the tested methods.
Note that in the Heshan case,the A-horizon SOM values of 39 modeling samples showed an obvious right-tailed distribution(minimum=25.6 g kg-1,maximum=326.4 g kg-1,mean=53.0 g kg-1,and standard deviation=47.2 g kg-1)owing to a modeling sample with a very high SOM value.This situation will impact the quality of the MLR modeled directly with original values of this dataset of modeling samples.To relieve such adverse impacts,some numerical transformations are necessary before using the SOM values for fitting the MLR,such as taking the square root(Wanget al.,2013),the logarithm(Bostanet al.,2012;Songet al.,2016),the reciprocal,or the Box-Cox transformation.By trial-and-error,the reciprocal of SOM values of modeling samples in the Heshan case obeyed the normal distribution,by the Shapito-Wilk normality test(larger than 0.05,P=0.41)which is suitable for this small sample test.Thus,the reciprocal transformation of SOM values of modeling samples was used before building the MLR,GWR,and RK with the stepwise variable selection in the Heshan case.For modeling the GWR in this case,the weighting function was calibrated based on the Gaussian function,and the bandwidth was optimized to 2 195.4 m based on the Akaike information criterion(AIC)(Burnham and Anderson,2004).
For building the MLR in the Xuancheng case,the parent material was treated as eight dummy variables,one for each parent material type in this area,with a value of 1 or 0(i.e.,belonging to the corresponding parent material type or not).In the Xuancheng case,the stepwise variable selection cannot effectively remove those highly-correlated environmental variables.Thus,the principal component analysis was adopted to relieve the collinearity issue among environmental variables during building the MLR,GWR,and RK.Because the contribution rate to total variation from the first principal component was over 99%,the first principal component and dummy variables were used to build the MLR.For modeling the GWR in this case,the weighting function was still calibrated based on the Gaussian function,and the bandwidth was optimized to 39 171.1 m based on AIC.During RK modeling in the Xuancheng case,the variogram function fitted with the 59 modeling samples had a range of 10 557.9 and a partial sill of 2.289 based on a spherical model and was still not stable.This situation is similar to that for RK in the Heshan case,although the number of samples in the Xuancheng case is larger than the minimum number of modeling samples(i.e.,50)recommended by Henglet al.(2007).
Owing to the fact that the modeling samples were collected by purposive sampling and with a limited number in both cases,we did not apply the maps from the MLR,GWR,and RK in qualitative comparison with the map from the SoLIM-IDW method.
RESULTS AND DISCUSSION
The Heshan case
In the Heshan case,the SoLIM-IDW method obtained lower error,in terms of RMSE,than the SoLIM,GWR,and MeanValuePredict methods whenr=0.25–1(Table I).Asrvalue increased,the performance of SoLIM-IDW first increased and then decreased,with the lowest RMSE being produced by the SoLIM-IDW whenr=0.75.The SoLIMIDW withr=0.5 produced the second lowest RMSE,which was very close to that from the SoLIM-IDW withr=0.75.The RMSE values from MLR and RK were very similar and were even larger than RMSE from MeanValuePredict in this case study.
As shown in Fig.2,the A-horizon SOM in Heshan estimated by SoLIM-IDW withr=0.75 was close to that estimated by SoLIM.The predicted values by SoLIM-IDW withr=0.75 ranged 29.7–293.8 g kg-1,which was much wider than that generated by SoLIM(48.0–61.6 g kg-1)and much closer to that of the actual soil samples(22.5–326.4 g kg-1).The main difference between the spatial patterns shown in the maps resulting from the two methods is that there is a“bull’s eye”pattern in the map generated by SoLIM-IDW(i.e.,obvious higher SOM prediction near the high-value modeling sample located in the channel).
Fig.2 Maps of the A-horizon soil organic matter(SOM)content in the Heshan Farm located in Heilongjiang Province,China estimated by the Soil Land Inference Model(SoLIM)-inverse distance weighting(IDW)method with power parameter r=0.75(a)and by SoLIM(b).
TABLE IQuantitative evaluationa)of the Soil Land Inference Model(SoLIM)-inverse distance weighting(IDW)method for soil A-horizon organic matter content mapping for the Heshan Farm(ca.60 km2)in Heilongjiang Province and soil A-horizon organic carbon content mapping for Xuancheng County(ca.5 900 km2)in Anhui Province of China by comparison with the original SoLIM(i.e.,SoLIM-IDW with power parameter r=0),multiple linear regression(MLR),geographically weighted regression(GWR),regression-kriging(RK),and MeanValuePredict methods based on an independent sample set
This phenomenon occurs often and is a consequence of the characteristics of IDW.In this case,areas in the flat channel have high soil moisture and rich humus,and thus,are reasonable areas for higher SOM prediction.
The Xuancheng case
As shown in Table I,the SoLIM-IDW method withr=0.25–0.5 produced smaller RMSE than the SoLIM method in the Xuancheng case.The SoLIM-IDW method withr=0.25–1 as well as the SoLIM method performed better than RK,GWR,and MLR,while MeanValuePredict performed the worst.The SoLIM-IDW method produced the smallest RMSE atr=0.25,which was very close to the second smallest RMSE produced by SoLIM-IDW withr=0.5.Its performance became worse with larger RMSE asrvalue increased.In both case studies,the SoLIM-IDW method produced smaller RMSE than the SoLIM method for at least some of thervalues tested,and the SoLIM-IDW method with a different value ofrin each case produced the smallest error.Thus,the proposed SoLIM-IDW method can improve the original SoLIM by taking into consideration the spatial distances between locations of interest and those of the soil samples.
Similarly,the A-horizon SOC in Xuancheng estimated by the SoLIM-IDW method withr=0.25 was close to that estimated by SoLIM(Fig.3).There are crisp blocks in the maps of A-horizon SOC generated by both SoLIMIDW and SoLIM.This is because of the effect of a nominal(categorical)environmental variable(i.e.,parent material)in the inference.The range of values predicted by SoLIM-IDW withr=0.25 was 5.81–21.91 g kg-1,which was wider than that predicted by SoLIM(7.85–20.68 g kg-1)and closer to that of the actual soil samples(2.54–27.23 g kg-1).For the Heshan case,there were“bull’s eyes”in the map generated by SoLIM-IDW,which is due to the characteristics of IDW and the local effect of modeling samples.
Fig.3 Maps of the A-horizon soil organic carbon(SOC)content in Xuancheng County located in Anhui Province,China estimated by the Soil Land Inference Model(SoLIM)-inverse distance weighting(IDW)method with power parameter r=0.25(a)and by SoLIM(b).
Discussion
The results of the two case studies showed that the SoLIM-IDW withr=0.5 consistently produced an RMSE very close to the lowest RMSE in both cases.Thusr=0.5 could be used with the proposed SoLIM-IDW for now,before further studies are conducted to determine the optimalrvalues or how to set the parameterrunder different application contexts(such as study area characteristics and the spatial resolution of soil mapping)in an adaptive manner.
Note that soils can change quickly in very short distances owing to the subtle change in environmental conditions,which cannot be captured by the environmental dataset at the resolution used with the digital soil mapping.For such changes in soils,the proposed method and other existing digital soil mapping methods cannot work.
CONCLUSIONS
Current SoLIM inference utilizes an attribute domain of environmental covariates but ignores the spatial distances between the location of interest and that of soil sample.We proposed a new method,SoLIM-IDW,which takes that spatial distance into consideration during the SoLIM-based predictive soil property mapping process.The evaluation experiments for each of the two case areas showed that the proposed SoLIM-IDW method performed better than the original SoLIM,MLR,GWR,and RK methods,and the error produced by SoLIM-IDW was minimized under differentrvalues in each of the two cases.Currently,the power parameterr=0.5 was suggested based on the results from these two case studies,under which the SoLIM-IDW consistently produced an RMSE very close to the lowest value obtained.Future work would include the determination of optimalrvalues under different application contexts in an adaptive manner.SoLIM has been applied to predictive mapping of geographic variables(phenomena)other than soil(Zhuet al.,2018),such as natural infectious disease outbreak indication(Duet al.,2020);the proposed method could also be further explored for its potentials in mapping other geographic variables(phenomena).
ACKNOWLEDGEMENTS
This study was funded by the National Natural Science Foundation of China(Nos.41871300,41422109,and 41431177),the National Basic Research Program of China(No.2015CB954102),the Priority Academic Program Development of Jiangsu Higher Education Institutions,China(No.164320H116),and the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province,China.We acknowledge the support from the Innovation Project of State Key Laboratory of Resources and Environmental Information System of China(No.O88RA20CYA).Support to Axing Zhu through the Vilas Associate Award,the Hammel Faculty Fellow Award,and the Manasse Chair Professorship from the University of Wisconsin-Madison,USA is greatly appreciated.
杂志排行
Pedosphere的其它文章
- Rice productivity and profitability with slow-release urea containing organic-inorganic matrix materials
- Bacterial communities in paddy soils changed by milk vetch as green manure:A study conducted across six provinces in South China
- Efficiency of soil-applied 67Zn-enriched fertiliser across three consecutive crops
- Effect of long-term fertilization on bacterial communities in wheat endosphere
- Degradation of the fungicide metalaxyl and its non-extractable residue formation in soil clay and silt fractions
- Soil micromorphological and physical properties after application of composts with polyethylene and biocomponent-derived polymers added during composting