Hawk‐eye‐inspired perception algorithm of stereo vision for obtaining orchard 3D point cloud navigation map

2023-12-01ZichaoZhangJianChenXinyuXuCunjiaLiuYuHan

CAAI Transactions on Intelligence Technology 2023年3期

Zichao Zhang| Jian Chen| Xinyu Xu,2,3 | Cunjia Liu | Yu Han

1College of Engineering,China Agricultural University,Beijing, China

2Jiangsu Province and Education Ministry Co-sponsored Synergistic Innovation Center of Modern Agricultural Equipment, Jiangsu University,Zhenjiang, China

3Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, MNR,Shanghai,China

4Department of Aeronautical and Automotive Engineering, Loughborough University,Loughborough,Leicestershire, UK

5College of Water Resources and Civil Engineering, China Agricultural University,Beijing,China

6State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing,China

7Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Natural Resources,Shenzhen, China

Abstract The binocular stereo vision is the lowest cost sensor for obtaining 3D information.Considering the weakness of long-distance measurement and stability, the improvement of accuracy and stability of stereo vision is urgently required for application of precision agriculture.To address the challenges of stereo vision long-distance measurement and stable perception without hardware upgrade, inspired by hawk eyes, higher resolution perception and the adaptive HDR(High Dynamic Range)were introduced in this paper.Simulating the function from physiological structure of‘deep fovea’and‘shallow fovea’of hawk eye, the higher resolution reconstruction method in this paper was aimed at accuracy improving.Inspired by adjustment of pupils, the adaptive HDR method was proposed for high dynamic range optimisation and stable perception.In various light conditions,compared with default stereo vision,the accuracy of proposed algorithm was improved by 28.0%evaluated by error ratio,and the stability was improved by 26.56%by disparity accuracy.For fixed distance measurement, the maximum improvement was 78.6% by standard deviation.Based on the hawk-eye-inspired perception algorithm, the point cloud of orchard was improved both in quality and quantity.The hawk-eye-inspired perception algorithm contributed great advance in binocular 3D point cloud reconstruction in orchard navigation map.

K E Y W O R D S adaptive high dynamic range, binocular stereo vision, hawk-eye-inspired perception, point cloud of orchard,super-resolution generative adversarial network

1 | INTRODUCTION

Precision agriculture needs accurate perception.In orchard automation agricultural production, accurate and stable 3D navigation map is the basis of UAV and unmanned vehicle operation [1].Agricultural production is very sensitive to the cost.Binocular stereo vision has advantages over other sensors because of low cost,high sampling frequency,large amount of data obtaining ability, low-power consumption, and light weight, which is widely used in the all-production links in smart agriculture [2, 3].However, in most condition of agricultural outdoor working environment, optical sensors and images are vulnerable to terrible illumination conditions.The application of binocular vision is mainly limited by the following: First, the perception accuracy of long-distance spatial 3D information and range rank cannot meet the requirements, especially the target and obstacle information;second, the stereo vision sensors cannot be stable under the condition of agricultural environment so that they hardly overcome the occlusion, deformation, background clutter and illumination.

In order to overcome the two main weaknesses of stereo vision sensors, the bionic method provided a new approach.The hawk dominates the top of food chain because of the extremely long perceived distance of hawk eye.Under specific fixation conditions, raptor's eye has the advantage of highresolution and HDR perception.Zhao et al.[4] and Snyder et al.[5] found that the raptors' eyes have the dominant advantages of high-resolution and HDR perception.The highresolution advantages benefit from the physiological structure of ‘deep fovea’ and ‘shallow fovea’ in raptors' eyes, and the HDR perception profits from the big size of pupils and the powerful dilator pupillae and sphincter pupillae for dynamic adjustment with the illumination changes.In specific hunting conditions,for high resolution imaging of local visual field,the raptors capture images with the ‘deep fovea’ and ‘shallow fovea’ where photoreceptor cell density stays extremely high.As shown in Figure 1,raptor's eye imaging mainly includes the following processes: First, under different resolution requirements, the eye changes the hunting state and obtains environmental images with different resolutions by using‘deep fovea’and‘shallow fovea’.At this time,the images are received by ‘deep fovea’ and ‘shallow fovea’, but it cannot meet the requirements of prey recognition because the hawk is in different hunting states (high altitude search or low altitude capture).Second,according to different environmental lighting conditions, with the help of powerful dilator pupillae and sphincter pupillae, the eagle eye pupil can adjust quickly and adaptively according to environmental conditions to achieve the effect of adaptive HDR.

Inspired by hawk eyes, to address the challenge of perception accuracy of binocular stereo vision, especially the long-distance spatial 3D information, the binocular visual resolution was improved for simulating the hawk eyes' highresolution imaging without hardware upgrading, markedasHESR().To ensure the improved resolution quality of binocular sensors, the generative adversarial transfer learning of SRGAN(Super Resolution Generative Adversarial Network[6]) was trained with orchard greyscale datasets.With the changes of binocular image resolution, the mapping relationship of disparity was transferred and the disparity range was expanded, which was more accurate for describing the longdistance depth.As for the perception stability, the hawk eyes'HDR perception was simulated by adaptive HDR designed by the illumination-reflectance model, markedasHEHDR().The illumination-reflectance model could decompose the image into illumination component and reflectance component.The enhancement method was designed according to each component, which can be designed adaptively for different components.The mainstream of hawk-eye-inspired perception algorithm of stereo vision is shown in Figure 1

Based on the above two directions, first, this paper proposed a hawk-eye-inspired perception algorithm to improve the binocular vision resolution for realising the high-precision perception of spatial 3D information.Second, the adaptive HDR algorithm was employed to ensure the stability of binocular vision perception.The performance of this method was verified under different lighting conditions, and the highprecision point cloud modelling of orchard 3D navigation map was constructed based on different frames with IMU.The main contributions are summarised as follows:

(1) The orchard image data with different resolutions were collected as train set,and the SN(Spectral Normalisation)method was employed to complete the transfer learning of SRGAN model.Based on the transfer learning model,the super-resolution reconstruction of binocular visual image was completed by up-sampling, which improved the perception accuracy of 3D information in binocular visual space after stereo matching.

(2) With the illuminance-reflectance imaging model, the adaptive reflectance component was designed to counter the overexposed area, and the adaptive illumination component was designed to enhance the underexposed area, which was improved both in the accuracy of binocular visual disparity map and perception stability, and also the spatial 3D information perception error was reduced.

2 | RELATED WORK

2.1 | Biomimetic visual system

Most bionic systems took the bionic idea to guide the design in mechanical-inspired way [7–9] and functional-inspired way[10–15].As for biomimetic visual system, Duan et al.[10]designed the bionic perception system for aircraft refuelling target detection, and the attention mechanism of raptors' eye was the key point for accurate detection.The compound eye system was designed by Liang et al.[12] and Yu et al.[11].Liang et al.[12] focussed on the multiple focal lengths' superiorities, and the study of Yu et al.[11] focussed on the largefield of view and multi-baseline of stereo vision.Both the studies took full advantages of compound eye through the biomimetic visual system.The large-field of view was the base of many other applications.Zhou et al.[13] provided the panoramas though the omnidirectional vision using the vision cue.Yuan et al.[16] designed the unstructured array camera module based on the idea of human brain when facing complex task.The module allowed the hierarchical array cameras perform its own functions and cooperation for complex task.The same as Ref.[11], the disparity mapping relationship of this paper was changed for binocular vision enhancement.Different from the method of Ref.[11],the disparity mapping relationship of our method was enhanced in a functionalinspired way through image super resolution, rather than hardware upgrading.

F I G U R E 1 Hawk-eye-inspired perception algorithm of stereo vision and parts of subfigure come from Snyder et al.[5]

2.2 | Perception accuracy of stereo vision

For accuracy improvement, the sub-pixel method of disparity expended the accuracy through sub-pixel from integer unit into 1/50th pixel level or more.The sub-pixel interpolation method dominates the accuracy of disparity [17, 18], Zhang et al.[17] provided a sub-pixel method using the 8 neighbour pixels of target pixels for interpolation, and Morgan et al.[18]introduced the filter to the sub-pixel method.The accuracy improved by most sub-pixel methods could help the 3D point cloud quality in 3D accuracy of each point rather than point cloud quantity or density, but the quantity and the density are important for sensors' application.Improving the accuracy and quantity through hardware upgrading could be found in the studies of Zhang et al.[19] and Wang et al.[20].Zhang et al.[19] emphasised the visual perception in intelligent driving and put forward the demand of stereo vision, which could be treated as principle of hardware design.Wang et al.[20] provided the analysis of the guidelines of bionic eyes using flexible stereo vision equipment.Adil et al.[21]designed a novel distance measurement algorithm using stereo vision.The accuracy and the quality of disparity map were most important indices for evaluating the results.For our hawk-eyeinspired method, the super resolution could be processed by Bicubic [22], and Dong et al.[23] introduced the deeplearning-based method for super resolution.Ledig et al.[6]first designed the GAN (Generative Adversarial Network) for super resolution.Generative adversarial training with content loss has become a trend in deep learning-based superresolution research [24–26].Generative adversarial training shows great advantage in detail reconstruction ability, which meets the requirement of binocular vision super resolution for simulating the dual fovea of hawk eye.For our method, the detail reconstruction ability of generative adversarial training[6] was the guarantee of image quality for the process of stereo matching.

2.3 | Perception stability of stereo vision

The stability of binocular of our research was defined by the quality of disparity map and the 3D point cloud.The same as our opinion, for enhancing the perception stability of camerabased sensors, Nguyen et al.[27] put forward a descattering approach for countering the scatterer environments.The binocular sensors of Ref.[27]was equipped with an active light source.The same as the intense light and fog environment in outdoor agricultural works, the dual noise coming from scatterer and light was the major trouble of Nguyen et al.[27] for perception stability.The depth information was employed for dehazing in Ref.[28].The enhancement carried out by the illuminance-reflectance imaging model shows great potential for different applications[29–32].Wang et al.[29]designed the enhancement for multispectral images based on the illuminance-reflectance imaging model, and Yu et al.[30]designed the image enhancement to epically focus on noise based on the illuminance-reflectance imaging model.Eilertsen et al.[31]designed the loss function of deep learning based on the illuminance-reflectance imaging model.Lecca et al.[32]designed the evaluation rules based on the interest point extraction instead of the human perspective, and the illuminance-reflectance imaging model-based methods showed top level performance in evaluation.In this paper, the HDR enhancement method was designed by using the illuminancereflectance imaging model.

3 | HAWK‐EYE‐INSPIRED BINOCULAR VISION SUPER RESOLUTION

As shown in Figure 2, the design of Hawk-eye-inspired binocular vision super resolution is mainly divided into two parts.In the first part, with the orchard training data, the generator with SN and residual block are trained by different resolution parts in generative adversarial ways; in the second part, the reconstructed binocular image is carried out by the trained SN generator; finally, through the stereo-matching algorithm, combined with calibration parameter in super resolution, high-precision spatial 3D information is outputted as depth map and point cloud.

F I G U R E 2 Hawk-eye-inspired binocular vision super-resolution diagram

According to the principle of binocular stereo vision, thez-axis (depth) error of binocular stereo vision is defined as:

whereZis the ground truth of depth;fXCis the equivalent focal length inXC-axis of camera coordinate; in stereo vision,fXCis usually a constant if the camera does not zoom;Brepresents the baseline of the binocular vision;Dis the disparity,D∈[Dmin,Dmax](the unit is pixels), and the defaultDis defined in integers.In standard binocular vision, the ranging error of long-distance obstacles is mainly caused by the fact that the change of disparity per unit pixel is not enough to describe the change of depth of field.Therefore, sub-pixel interpolation for disparity can be described as:

wherea1...aN+1are the constants defined by sub-pixel interpolation law.If the interpolation law has been defined,a1...aN+1are determined uniquely; therefore, for different targets, the generalisation of sub-pixel interpolation is weak[17].In this paper, the hawk-eye-inspired binocular vision is based on super resolution, and the mapping relationship of disparity is defined by:

whereHESR() represents the hawk-eye-inspired binocular vision super resolution, with a differentXC, the generatorGcarries out the super-resolution binocular image; in upsample ratioη, the mapping relationship is provided by stereo matching inD∈(δDmin,δDmax).The new mapping relationship is benefitted from disparity range expansion and super-resolution reconstruction generator with better generalisation ability; therefore, the mapping relationship is more accurate for describing the long-distance depth of visual field.If the hardware of binocular sensor is not limited, changing the binocular baselineBalso improves the perception accuracy by constructing a new disparity mapping relationship[11].If the sub-pixel interpolation is constructed with Formula (2), the error will be further reduced.However, due to the change of relative position of binocular camera, new calibration parameters matching the resolution are needed.

3.1 | The SN generator with residual block

Considering that the training of generative adversarial network is easy to fall into the imbalance of strong discriminator and weak generator, this paper initialised the generator with the residual network pre-trained model and initialised the discriminator with random parameters.The generator and discriminator based on the SN method were designed for further inhibiting the disappearance of network gradient.The SN is described as follows [33]:

Formula(5)regards the SN expectation as an adaptive ratioθfor punishing the hidden layers,which is over concentration in any direction of gradient.The SN method would not change the structure of the hidden layers.Therefore, the generator could be initialised by the pre-trained model and continue training with SN.The residual block with SN in the generator is shown in Figure 3.

In Figure 3,SN(conv2d)is the 2D convolution layer with spectral normalisation,and Relu is the activation function.The pre-trained generator is initialised by SRGAN [6], and the discriminator is constructed by 2 SN residual blocks and initialised by the random method in Gaussian distribution.

3.2 | Generative adversarial training

The training progress shown in Figure 2 could be defined as:

F I G U R E 3 Residual block with spectral normalisation (SN)

whereϕrepresents the VGG19 network;theWandHare the width and height of feature map of VGG19, respectively; the adversarial loss is defined as:

whereD(G(imgLRI))represents the progress of the discriminatorDwhen distinguishing the super-resolution imageG(imgLRI)from the generator, and the output ofDis the possibility between 0 and 1.

In this paper, the learning rate of training was 10-4, and batch size was 16 with the Adam optimiser.To prevent overfitting, the learning rate decay began in 104steps.If the gradient was invalid in training progress, the weightγshould be adjusted.In this paper,whenγ=0.001 and steps=37,500,the generatorGwas regarded as the hawk-eye-inspired binocular vision super resolution.The left and right images are enhanced,and the new mapping relationship of disparity is obtained as:

where SM() is the stereo matching operator; imgLand imgRrepresent the left and right image, respectively.

4 | HAWK‐EYE‐INSPIRED HDR ENHANCEMENT

The stability of binocular stereo vision is defined by the quality of the disparity map.The disparity map is obtained by theSM(stereo matching);SMoperator's pipeline is defined as:

whereNLis the ROI(region of interest) in left image; the operator is searching the minimum cost of ROINR(Di) in right image based onNLunder image noise of leftμand rightκ,μ,κ～N(0,δ2).The position of minimum costCin right image is marked as disparity.The noise powerδ2is influenced by weather or other environment conditions.

In orchard, the mismatching is usually caused by illumination and shadow.For improving the SNR (Signal to Noise Ratio), the hawk-eye-inspired HDR enhancement reduces the noise influence, which is shown as follows:

The hawk-eye-inspired HDR enhancement HEHDR() is designed through the illumination-reflectance imaging model:

whereS(x,y) is the greyscale channel with the image coordinate of (x,y);I(x,y) is the illumination component with the image coordinate of (x,y);R(x,y) is the reflectance component with the image coordinate of (x,y).The illumination component is obtained by the Gaussian filter as follows:

4.1 | Adaptive adjustment in overexposed area

After obtaining the illumination and reflectance components,this section designs an adaptive method to adjust the reflectance components to counter the overexposed area.The overexposed area could be adjusted from the illumination and reflectance components.Since the illumination component is obtained by Gaussian filtering, adjusting the illumination component will affect the image sharpness and details.Therefore,this section designs adaptive conditional reflectance components as follows:

whereI(x,y)∈[0, 1] andR′(x,y)∈[0, 1]; meanIrepresents the mean value illumination components;δis the control coefficient.In this paper,δ=0.5.

Figure 4a was the part of orchard in this paper.There were sharp overexposed areas and underexposed areas in Figure 4a.The upper part of Figure 4a was typical overexposed areas with high meanI.In Figure 4b, the reflectance components of overexposed areas were to be adjusted.After adaptive adjustment, the reflectance components of overexposed areas were shown in blue in Figure 4c,which was meant that the values of the reflectance components were lower than before.In Figures 4b,c, the adaptive adjustment in overexposed areas had few influences in underexposed areas, and the adjustment provided sufficient conditions for further adjustment of underexposed areas.

4.2 | Adaptive adjustment in underexposed area

The illumination component is obtained by Gaussian filtering,and it is shown with continuity and smoothness.If the illumination area is adjusted according to the reflectance component with the pixel unit, it is difficult to preserve the image details after image reconstruction with adjustment.Therefore, for preserving the default illumination component distribution as much as possible, the adjustment of underexposed area must not be conflicted with the adjustment of overexposed area.

First,the extremely bright areas(I(x,y)≈1)and extremely dark areas (I(x,y)≈0) should be preserved as much as possible, because the great change in extremely areas is easier to create noise than other areas.Therefore, the adjustment kernelh(I(x,y)) is designed based on Hyperbolic Tangent as follows:

where the adaptive adjustment coefficientαis defined as:

whereβ∈[0, 2] is the gain.In this paper, the adaptive adjustment coefficientαinfluences adjustment kernelh(I(x,y)) based on the residual between mean value illumination components meanIand the threshold(0.5 in this paper);the illumination component is defined as:

where Nor() represents the normalisation operator.The correspondence between the mean value of the illumination components, meanI, andI(x,y), the illumination components after adaptively adjustment, are shown in Figure 5.

In Figure 5, the preserved area represents the extremely bright areas(I(x,y)≈1)and extremely dark areas(I(x,y)≈0).The adjustment area mainly aimed at the middle areas and the adaptive adjustment.The adaptive adjustment is defined by the value of meanI.The adjustment is shown in Figure 6.

The experiment scene of Figure 6 is same as that of Figure 4.There were lots of underexposed areas in the lower part of Figure 4a.In these parts, the value of meanIwas lower than threshold obviously.The pseudo colour map of the illumination components of Figure 4a is shown in Figure 6a.The shadows dominated the main parts, and the illumination componentsI(x,y) was lower than other parts and shown in yellow or blue rather than dark red.After adaptive adjustment, most of lower parts with yellow or blue parts were enhanced into dark red, which meant that the illumination components were adjusted effectively.The output of hawk-eye-inspired HDR enhancement is shown as follows:

F I G U R E 4 Adjustment in overexposed area.(a) The environment of orchard, (b) the reflectance components before adjustment, and (c) the reflectance components after adjustment

F I G U R E 5 The correspondence between meanI and I(x,y)

The adaptive enhancement in Figures 4 and 6 shows that the adjustment areas was discrete based on illumination components and reflectance components.The enhancements were decoupling.There were no repeated adjustments for the same area, which was accorded with biological vision adaptive adjustment.

5 | EXPERIMENTS AND DISCUSSIONS

The experiments were divided into stability and binocular visual perception accuracy, and the 3D reconstruction of point cloud of orchard navigation map was carried out.First, the stability of binocular visual perception was evaluated as follows.

5.1 | The stability experiment of hawk‐eye‐inspired algorithm

The output of binocular stereo matching is a disparity map,and any disparity of which can correspond to the spatial 3D information one by one through depth map with different calibrations.If the accuracy of disparity map is insufficient,which is caused by the image noise, the obtained coordinates or coordinate sets are easy to mismatch.Therefore, the accuracy of disparity map could be the index for evaluating the stability of hawk-eye-inspired enhancement.

The accuracy of disparity map is defined as:

where ︿D(x,y)is input disparity map for evaluation;D(x,y)is the ground truth;the resolution of input and the ground truth are (M,N); TorF(∗) is the True or False operator; when ∗is True, the output of TorF(∗)=1; when ∗is False, the output of TorF(∗)=0;ηis the accuracy threshold.In this paper,η=3.

Because it is difficult to obtain the ground truth of the outdoor environment disparity map, and it is hard to ensure the accurate measurement of each disparity or depth in the image, the binocular vision public dataset labelled as Middlebury [35] training set was selected for the experimental evaluation in this section.There were 15 sets of binocular images and disparity ground truth images in the dataset.The evaluation stereo matching algorithms included local matching algorithm BM [36], semi global stereo matching algorithm SGBM [37], and stereo matching algorithm PSMNET based on deep learning [38].Under the GPU acceleration, the process time of generating a set of depth maps by the PSMNet algorithm was about 2 s.In this paper, the PSMNet algorithm was only used as a typical algorithm to compare the accuracy of disparity map and assisted in verifying the perceptual stability.

F I G U R E 6 Adjustment in underexposed area.(a) The illumination components before adjustment and (b) the illumination components after adjustment

The evaluation results of 15 sets of Middlebury stereo vision datasets are shown in Tables 1 and 2.Among them,HEHDR (Hawk Eye HDR) indicated that this set of experiments had been processed by the hawk-eye-inspired adaptive HDR perception design proposed in this paper.Group numbers 1–15 were corresponded to the Middlebury dataset training sets Adirondack (Group 1), Artl (Group 2), Jadeplant(Group 3), motorcycle (Group 4), motorcyclee(Group 5),Piano (Group 6), Piano (Group 7) and Pipes (Group 8),Playroom(Group 9),Playtable(Group 10),Playtablep(Group 11),recycle(Group 12),shelves(Group 13),Teddy(Group 14)and vintage (Group 15) test results.The data in the Tables 1 and 2 were taken in percentage (%);the greatest improvement is given in bold in Tables 1 and 2.

Compared with other experimental groups, playroom(Group 9) was filled with light and shadow.With HEHDR,the accuracy of PSMNET disparity map was improved by 26.56% in Group 9 of Table 2.In Figure 7, by comparing the representative evaluation scenarios MotorcycleE (Group 5)and PianoL (Group 7) with Group 4 and Group 6, both groups had changed the lighting conditions based on the original data.The accuracy of the disparity map obtained by each algorithm after the sudden change of lighting conditions showed a downward trend,but with HEHDR,the accuracy of the disparity map was significantly improved.In the evaluation of BM and SGBM stereo matching algorithms,the accuracy of the disparity map with HEHDR was close to the result of the verification of the scene with unchanged lighting conditions.It was proved that the hawk-eye-inspired enhancement proposed in this paper was effective.It is worth noting that PSMNET,a stereo matching algorithm based on deep learning, significantly reduced the accuracy of disparity map after changing the lighting conditions.Compared with the local matching algorithm BM and semi global stereo matching algorithm SGBM,it had poor anti-lighting transformation ability.For the above representative stereo matching algorithms, the results showed that the hawk-eye-inspired adaptive HDR in this paper was stable and effective for stereo vision enhancement.

5.2 | The ranging accuracy experiment of hawk‐eye‐inspired algorithm

Hawk-eye-inspired algorithm perception accuracy was evaluated by the accuracy of spatial 3D information perception inZdirection (target depth information).The binocular stereo vision accuracy was evaluated by the error ratioτ, and the perception stability was evaluated by the standard deviation(STD)of multiple groups of ranging results.The experimental images were collected by Opencv-Depthai binocular vision system, with an image resolution of (640, 400).Three scenes were collected:(a)Orchard dark light environment,which was collected in dusk; (b) Orchard light environment, which was collected at noon;(c)Mixed environment,which was collected at noon with bright and dark light environment at the same time.The test sampling was to collect the 4 neighbourhoods of the target image coordinates in left image, the average value was taken as the perception result, and the depth map was generated by the SGBM algorithm.The original data was labelled as raw, with the processing of the hawk-eye-inspired adaptive HDR was labelled as HEHDR, with the processing of hawk-eye-inspired binocular vision super resolution was labelled as HESR,and the reconstructed resolution was(1280,800), with the processing of hawk-eye-inspired adaptive HDR and hawk-eye-inspired binocular vision super resolution were labelled as HEHDR + HESR.For comparing the idea of hawk-eye-inspired algorithm perception, the SRResnet [6] and Histogram equalisation [39] were combined as the improving idea by super-resolution and image enhancement, which were labelled as SR [6] + HE [39].20 groups of perception tests were conducted for each scene and each method, and the average value was taken,as shown in Figure 8.The verification results are shown in Tables 3, 4 and 5.

The error ratioτis defined as:

whereL=6m...21mis the distance between the target and binocular vision;LMis theMth mean of ranging values.The ranging accuracy experiment of hawk-eye-inspired is shown in Tables 3, 4 and 5 with error ratioτand standard deviation STD.The best result in each test is given in bold in Tables 3,4 and 5.

T A B L E 1 The evaluation results of hawk-eye-inspired enhancement based on Middlebury [35]

T A B L E 2 The evaluation results of hawk-eye-inspired enhancement based on Middlebury [35]

F I G U R E 7 Parts of evaluation experiment scene of Middlebury[35].(a)Group 4:Motorcycle,(b)Group 5:MotorcycleE,(c)Group 6:Piano,(d)Group 7:PianoL, and (e) Group 9: Playroom

From the result of Tables 3, 4 and 5, with the hawk-eyeinspired binocular vision super resolution (HESR), regardless of whether hawk-eye-inspired adaptive HDR (HEHDR) was adopted or not, the error ratio was greatly reduced in most of the four scene test results, of which the maximum reduction range was 28.0% (obtained by comparing the Raw with the HEHDR+HESR in 21 m,scene C).It was proved that hawkeye-inspired binocular vision super resolution (HESR) was effective to improve the accuracy of binocular visual perception.Under the Mixed illumination of scene C, the error ratio of group Raw was the largest, which proved that there were many shadow illuminations coexisting in the orchard, and the insufficient resolution greatly affected the perception accuracy.However, with the enhancing of hawk-eye-inspired binocular vision super resolution (HESR), the error ratio was greatly reduced.Therefore, in the case of poor ambient illumination conditions, the effect of improving the binocular visual perception accuracy by improving the resolution was the best.The SRResnet was trained without generative adversarial training, and the binocular image was poorer than that with generative adversarial training.Histogram equalisation broke the illumination component distribution.Thus, the STD of group SR [6] + HE [39] was higher than the group of HEHDR + HESR in most experiments.

F I G U R E 8 The ranging accuracy experiment of hawk-eye-inspired enhancement

T A B L E 3 The ranging accuracy results of scene (a)

T A B L E 4 The ranging accuracy results of scene (b)

T A B L E 5 The ranging accuracy results scene (c)

In each scene, after the hawk-eye-inspired adaptive HDR(HEHDR),the STD decreased significantly,which proved that the hawk-eye-inspired adaptive HDR (HEHDR) was effective in enhancing the perceptual stability.When the target distance of scene C was 18 m,the STD of HESR group was improved most significantly compared with HEHDR + HESR group,and the STD decreased by 78.6%.For the scene in the HEHDR group, no matter which scene was far from the target, the STD did not decrease significantly, because the target distance reached the ranging limit, and the error ratio and the STD lost evaluation significance.It was worth noting that in scene C,when the target was 21 m away,the error ratio reached 21.0%for the HESR group,while in other scenes,the HESR can still maintain a small error ratio when the target was 21 m away, which proved that when the orchard shadow and light coexisted in a large number, optimisation from the perspective of hawk-eye-inspired binocular vision super resolution (HESR) cannot meet the needs of long-range stable perception.When the target was 21 m away in scene C, the perceptual stability of HESR + HEHDR was more significantly enhanced, and the error ratio and STD evaluation of HESR+HEHDR group were the best.Even the group of SR[6] + HE [39] showed poorer performance than that of HESR+HEHDR,and the group of SR[6]+HE[39]showed significant improvement compared with Raw group, which proved that the idea of hawk-eye-inspired algorithm perception was solid.

5.3 | The 3D point cloud reconstruction of orchard

As shown in Figure 9,the 3D point cloud map of the orchard was visualised, the group of Raw is shown in dark yellow, the group of HESR + HEHDR is shown in green, and the SR[6]+HE[39]is shown in dark red.The experimental orchard was a hawthorn orchard.The binocular data was obtained by DJI M100 UAV equipped with a tablet computer, and the image data was collected by Opencv-Depthai binocular camera.Only one single frame image was collected by the left and right cameras, and the grey image of the left and right was collected at the resolution of (640, 400).After the hawk-eyeinspired perception algorithm processing, the image superresolution was reconstructed to (1280, 800) resolution, so did SR [6] + HE [39].The SGBM algorithm was used for stereo matching.After obtaining the depth information with different calibration data, it was projected to the LiDAR coordinate system.The visualisation of the test results is shown in Figure 9.Among them, Figure 9a shows the collection of orderly planting orchard in dark light environment (dusk period), which is marked as group a.Figure 9b shows the collection of disorderly planting orchard in backlight environment (noon period), which is marked as groupb.

In the above two groups of tests, the significant targets(trees marked as first to eighth and first to seventh) in the field of vision are marked and shown in Figure 9.The handheld laser rangefinder (measuring range: 0.05–60 m,accuracy: ± 1.5 mm) was used to measure the distance between the centre of the hovering circle of the UAV and the trunk of the marked tree.The average value of each tree was measured for 10 groups, and the accuracy of the 3D point cloud was verified by comparing with 3 groups:HESR + HEHDR, Raw and SR [6] + HE [39].The result is shown in Tables 6 and 7.The best result in each test is given in bold in Tables 6 and 7.Taking the handheld laser rangefinder result as the benchmark, the accuracy of the group of HESR + HEHDR and group of SR [6] + HE [39] was significantly higher than that of Raw.Especially after the distance exceeded 7 m, the SR [6] + HE [39] showed the best performance in the nearer target.When the targets were far from the start point, the HESR + HEHDR dominated the best performance because of the fine reconstruction and the HDR without illumination component distribution broken.The results were consistent with the accuracy verification result in Section 5.2.

The results of Tables 6 and 7 could prove that the hawk-eyeinspired perception algorithm of binocular stereo vision was solid,and the accuracy quality of 3D point cloud was improved by accuracy evaluation.The quantity of HESR+HEHDR was 2,096,349(group a)and 1,925,715(group b),and the quantity of the Raw was 555,768(group a)and 518,844(group b),while the quantity of SR [6] + HE [39] was 1,876,542 (group a) and 1,804,732 (group b).The quantity was improved significantly using the haw-eye-inspired perception algorithm.The reconstruction quality of 3D point cloud was evaluated by point cloud visualisation as follows:

In the orderly planting orchard, the UAV hovered and yawed around collected left and right multi-frame images at a fixed frequency,while the IMU recorded the pose information.Multi-frames of binocular stereo vision constructed the 3D points with the IMU, which is shown in Figure 10.

In Figure 10, the panorama 3D point cloud map was analysed from four perspectives.The four perspectives were focussed on the overlapping area of two frames.The accuracy of this area was mainly affected by the pose accuracy recorded by the IMU and the accuracy of point clouds.In the overlapping area,the point clouds were obtained from multi-frames of images, which completely overlapped the point clouds of the same fruit tree.The quantity of point clouds obtained from the same tree was further improved, which met the construction quality requirements of multi-frames.

In order to further explore the effect of 3D point cloud reconstruction quality of multi-frames,from the same perspective,three frames were taken for multi-frame and single-frame evaluation.The reconstruction results with HESR+HESDR,SR[6]+HE[39]and Raw are shown in Figure 11.

Figure 11a shows multi-frame reconstruction in single perspective with the hawk-eye-inspired perception algorithm(labelled as HESR+HEHDR),and the nearest tree target was 10 m from the UAV.The subfigure in Figure 11a was the greyscale image of the reconstruction quality evaluation perspective;the HESR+HESDR,SR[6]+HE[39]and Raw were evaluated by this perspective,and the results are shown in Figure 11b to Figure 11g with zoom in the figure.Figure (b)and (c) were the reconstruction result of HESR + HESDR,and 5 trees in Figure 11b were reconstructed completely.But in Figure 11c, one tree with yellow circle was reconstructed incompletely, and the quality of point clouds was significantly improved after multi-frame reconstruction.The same results are shown in Figure 11d and Figure 11e, but the result of Figure 11d (SR [6] + HE [39]) was poorer than that of Figure 11b (HESR + HESDR), and also the result of Figure 11c(HESR+HESDR)was better than Figure 11e(SR[6] + HE [39]).The worst result is in Figure 11f,e, the 3 trees were reconstructed by Figure 11f rather than 5 trees, and the Figure 11e showed great distortion, which cannot meet the requirement of the navigation map.In summary, after multiframe reconstruction, the horizontal angle of view was significantly widened.The horizontal angle of view increase was obtained from the altitude information recorded by the IMU.In this example, the horizontal angle of view improvement was 22.3°.After visualisation, the quality of the multiframes was significantly improved compared with that of single frame.The improvement of reconstruction quality was very helpful for further detailed mapping of orchard operation environment.

F I G U R E 9 3D points cloud navigation map of orchard with and without hawk-eye-inspired perception algorithm (single frame).(a) Orderly planting orchard in dark light environment and (b) disorderly planting orchard in backlight environment

6 | CONCLUSION

Hawk stands at the top of the food chain by relying on the powerful visual ability using the ‘deep fovea’, ‘shallow fovea’and pupil structure.Inspired by hawk eyes,this paper proposed hawk-eye-inspired perception algorithm of stereo vision.The shortcomings of binocular in long-distance measurement and stable perception were overcome by the hawk-eye-inspired perception algorithm.With the hawk-eye-inspired perception algorithm, the UAV, which was equipped with binocular sensors,had completed the reconstruction of 3D point clouds for orchard navigation,and the number and quality of point clouds were significantly improved after reconstruction.This method retained the hardware conditions and achieved the goal of obtaining high-quality 3D point cloud.In the future, for the perception of precision agriculture,we will continue to explore the agricultural research and application along the idea of biomimetic visual system [40–42].

T A B L E 6 The 3D point cloud accuracy evaluation by targets of hawthorn trees of group a

T A B L E 7 The 3D point cloud accuracy evaluation by targets of hawthorn trees of group b

F I G U R E 1 0 Panorama 3D point cloud navigation map of orchard with hawk-eye-inspired perception algorithm (multi-frames)

F I G U R E 1 1 The comparison of 3D point cloud reconstruction in different frames.(a) Multi-frame reconstruction in single perspective withHESR+HEHDR,(b)multi-frames of HESR+HEHDR,(c)single frame of HESR+HEHDR,(d)multi-frame of SR[6]+HE[39],(e)single frame of SR[6] + HE [39], (f) multi-frames of Raw, and (g) single frame of Raw

ACKNOWLEDGEMENTS

This work was funded by the National Natural Science Foundation of China (No.51979275), Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, MNR (No.KFKT-2022-05),Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources(No.KF-2021-06-115), Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No.VRLAB2022C10), Jiangsu Province and Education Ministry Co-sponsored Synergistic Innovation Center of Modern Agricultural Equipment (No.XTCX2002),2115 Talent Development Program of China Agricultural University and Chinese Universities Scientific Fund (No.2021TC105).

CONFLICT OF INTEREST

Authors have no conflict to declare.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the corresponding author upon reasonable request.

ORCID

Zichao Zhanghttps://orcid.org/0000-0002-0065-7374

Jian Chenhttps://orcid.org/0000-0002-9030-618X

CAAI Transactions on Intelligence Technology

2023年3期