Mobile robot localization algorithm based on multi-sensor information fusion

2020-04-28WANGMingyiHELileLIYuSUOChao

Journal of Measurement Science and Instrumentation 2020年2期

WANG Ming-yi, HE Li-le, LI Yu, SUO Chao

(College of Mechanical & Electrical Engineering, Xi’an University of Architecture & Technology, Xi’an 710055, China)

Abstract： In order to effectively reduce the uncertainty error of mobile robot localization with a single sensor and improve the accuracy and robustness of robot localization and mapping, a mobile robot localization algorithm based on multi-sensor information fusion (MSIF) was proposed. In this paper, simultaneous localization and mapping (SLAM) was realized on the basis of laser Rao-Blackwellized particle filter (RBPF)-SLAM algorithm and graph-based optimization theory was used to constrain and optimize the pose estimation results of Monte Carlo localization. The feature point extraction and quadrilateral closed loop matching algorithm based on oriented FAST and rotated BRIEF (ORB) were improved aiming at the problems of generous calculation and low tracking accuracy in visual information processing by means of the three-dimensional (3D) point feature in binocular visual reconstruction environment. Factor graph model was used for the information fusion under the maximum posterior probability criterion for laser RBPF-SLAM localization and binocular visual localization. The results of simulation and experiment indicate that localization accuracy of the above-mentioned method is higher than that of traditional RBPF-SLAM algorithm and general improved algorithms, and the effectiveness and usefulness of the proposed method are verified.

Key words： mobile robot; simultaneous localization and mapping (SLAM); graph-based optimization; sensor fusion

0 Introduction

The self-localization of mobile robot with sensor measurement information is the basis and premise of autonomous navigation and intelligent completion of specific tasks. The self-localization of robot is based on the known environment map model, while the generation of the environment map model in turn requires the robot to confirm its pose while obtaining the information of unknown environment, which leads to the simultaneous localization and mapping (SLAM) of robot. In this process, the estimation of robot pose and environment map model are interdependent and iterative. It is essentially a state estimation problem in which the data acquired by the sensor are solved mainly by filter or graph-based optimization. In Ref.[1], the Rao-Blackwellized particle filter (RBPF) algorithm that can effectively solve SLAM was proposed. This algorithm can well approximate the joint probability density of the mobile robot pose and the environment map model. It is still widely used in laser SLAM system, but it has such problems as ultra-high algorithm complexity and large memory space. The execution efficiency of algorithm proposed in Refs.[2-5] is improved in the proposed distribution or resampling mechanism of particles. With the gradual maturation of visual SLAM technology, the parallel tracking and mapping (PTAM) proposed in Ref.[6] adopted the nonlinear optimization method based on key frame, proposed the parallelization of tracking and mapping, and obtained better estimation results than traditional filter, but there are some limitations, such as small application scenarios and easy tracking loss. Refs.[7-8] inherited the main idea of PTAM algorithm, and proposed ORB-SLAM and its improved ORB-SLAM2. This algorithm used the ORB feature point of the image[9]for matching, the calculation efficiency has been greatly improved, but the single use of visual sensor may cause the high dependence on the visual features of the scene. In Ref.[10], graph-based optimization strategy was used to create an environment map while using the Harris operator to design the quadrilateral closed matching algorithm, thus the localization accuracy and algorithm efficiency are improved, but the environmental suitability of this algorithm is poor.

Since the single use of laser and vision is limited, multi-sensor information fusion (MSIF) can make up for the disadvantage of different sensors and improve the localization accuracy of the robot. In Ref.[11], the SLAM of laser and vision fusion was studied, the probabilistic heuristic model was used to extract the beam and project it onto a raster map to realize feature fusion. Although this method can effectively reduce the uncertainty of data, it has such problems as generous calculation and poor real-timeliness. In Ref.[4], laser and visual observations were used to improve the importance distribution of particle sampling on the basis of RBPF-SLAM, the robot localization results is better than traditional RBPF-SLAM, nevertheless, the information fusion was carried out by extended Kalman filter, thus good results can be obtained when the system is nearly linear and continuous. As a novel nonlinear optimization method, factor graph (FG) has the advantages of smooth growth and global optimization[12], which is gradually applied to sensor fusion. In Ref.[13], pre-integration was carried out on inertial measurement unit, and the efficient fusion of inertial measurement unit (IMU) and monocular vision was realized in the framework of FG. This algorithm is more suitable for unmanned aerial vehicle (UAV) application scene with large acceleration variation while has poor performance in ground mobile robot. Aming at above problems, in this paper, the laser/vision fusion based on FG was studied on the basis of laser RBPF-SLAM localization and FG fusion model, and the ORB feature point processing was perfected to adapt to the construction of binocular vision factor nodes.

1 Monte Carlo localization method based on laser scanning

1.1 RBPF-SLAM algorithm

The core idea of Monte Carlo localization is to represent the pose reliability of the robot in the environment with a set of random samples with corresponding weights (i.e. “particles”), particle resampling based on importance weights is required during localization. Therefore, this method is also known as “particle filter”. The joint posterior probability distribution function of SLAM is divided into the estimation of robot pose state and landmark map with robot pose state estimation as the condition by RBPF[1],

p(x1:t,m|z1:t,u1:t-1)=p(x1:t|z1:t,u1:t-1)·

p(m|x1:t,z1:t),

(1)

wherex1:tis the real-time pose state of robot movement;mrepresents the global landmark map;z1:tis the systematic observation and represents the observation data of laser ranging sensor;u1:tis system control quantity and represents the mileage data of coded odometer. The schematic diagram for the SLAM in the framework of RBPF-SLAM is shown in Fig.1.

Fig.1 Schematic diagram for SLAM

The steps of traditional RBPF-SLAM are as follows:

3) Calculate the weight of each independent particle to make up the difference between the importance distribution function and the target distribution according to the importance sampling principle.

(2)

The likelihood field (LF) is used in laser SLAM system to describe the probability that the laser ranging sensor detects an environmental obstacle. The laser-measured robot pose observation model can be obtained by calculating the distance between the current laser scanning point and the nearest adjacent point[14], as

(3)

whereN′ represents the number of laser scanning points att,djis the distance between the current laser scanning point and the nearest adjacent point.

4) Select particles according to the weight calculated in the previous step and eliminate particles with low weight, and resample when the number of effective particles is smaller than the preset thresholdNth, namely

(4)

1.2 Improved RBPF-SLAM algorithm

In the Monte Carlo localization process of the framework in RBPF-SLAM algorithm, assume Markov property, that is, the state of the robot attis only relevant tot-1, but irrelevant to the states beforet-1. Affected by such factors as the mechanical error of the robot itself and the measurement accuracy of laser sensor, the long-term operation of the system will cause large cumulative error. Now nonlinear optimization method is used to consider the states and observations beforet-1. The graph-based optimization theory[15]is used to constrain and optimize the real-time pose of the robot.

(5)

wherexirepresents the robot pose at thei-th peak;zijrepresents the edge between peakiand peakj, i.e. the constraint relation between two peaks, as shown in Fig.2. After obtaining the real-time robot pose by RBPF-SLAM, the translational rotation matrix between laser framesLijis obtained by iterative closest point (ICP)[16]. The accuracy of information matrixΩijis positively correlated to that of matrixLij.e(xi,xj,zij) is the error function for the matching degree among posesxi,xjand the constraint edgezij.

(6)

(7)

(8)

The optimal estimationx*of robot pose can be obtained with minimizedF(x) by Gauss-Newton method, Levinburg-Marquardt method and other gradient descent methods.

Fig.2 Graph-based optimization of mobile robot pose

2 Binocular vision ORB feature point processing

Single-wire laser ranging sensor can quickly and accurately obtain depth information of scanning plane and has strong anti-interference capability. However, the amount of environmental information contained in the laser point cloud is relatively single. In this paper, the fusion of single-wire laser sensor and binocular vision sensor could improve the accuracy and robustness of scene representation and robot localization.

In order to use binocular vision camera to restore the 3D information of the environment and realize camera pose estimation, the correction, filtering and other pre-processing of the left and right frame images should be firstly completed according to the calibration results. Feature point extraction and matching is the process of finding the corresponding points of the same scene on the preprocessed visual image.

2.1 Feature point extraction

The feature point is extracted by features from accelerated segment test (FAST) corner detection for oriented FAST and rotated binary robust independent elementary features (BRIEF) (ORB) algorithm. When detect the gray level of the pixels around the candidate point, if the difference in the gray level between the enough number of pixel in the neighborhood around the candidate point and this candidate point is great enough to reach the threshold, the candidate point is determined as a feature point.

In order to describe the attributes of the feature point, ORB algorithm makes FAST corner detection directional by gray-scale centroid method and uses rotation matrix and machine learning to improve the rotation invariance of BRIEF descriptor. Its subsequent improved algorithm makes ORB algorithm have good scale consistency by building a pyramid of images. The feature point extraction speed of this algorithm is much higher than that of scale-invariant feature transform (SIFT), speeded up robust features (SURF) and other algorithms, which can better meet the real-time requirements of robot localization[17].

2.2 Feature point matching

The key to determine whether the ORB feature points in the double-frame images are matching point pair is to check whether the hamming distance of the descriptors of the two feature points can reach the similarity threshold. In order to solve the rotation matrix and translation vector of the binocular camera, the pose change of the feature point in the next frame is generally tracked and matched after obtaining the 3D information of the feature point by stereo matching. The stereo matching and tracking matching process of traditional matching algorithm are relatively independent[18]. In order to further improve the accuracy of tracking matching, the quadrilateral closed loop matching structure as shown in Fig.3 is built for the adjacent binocular frames.

Fig.3 Quadrilateral closed loop matching structure

In Fig.3, the horizontal lines represent the stereo matching process of left and right frame images at the same time, and the tilt lines represent the tracking matching process of double-frame images on the same side at adjacent moments.

A quadrilateral closed loop constraint can be established according to the structural characteristics of binocular visual camera during movement. Upon tracking matching, the changes in the pixel coordinates of the same feature point are the same. In Fig.3, the slopes ofp1p3andp2p4are almost the same. The following constraint rules are established in combination with polar line constraint and parallax range constraint.

(9)

whereKis the slope test threshold anddmaxis the maximum parallax constraint. In the matching process of the quadrilateral closed loop of adjacent frames, the feature points that satisfy the above constraint relations are screened out. In order to optimize the matching results, random sample consensus (RANSAC)[19]is introduced for iterative screening. During actual matching, in consideration of the calibration error of binocular vision camera, the polar line constraint should be allowed to have a fluctuation range of three pixels around the theoretical pixel. The test results show that whenK=0.35, the matching success rate and accuracy can be ensured by the ORB feature point extraction and matching algorithm.

3 Laser/Vision fusion algorithm based on FG

During robot movement, the binocular images at adjacent moments are highly similar. If all frame images collected are processed, the information redundancy will increase rapidly, thus increasing the calculation amount. The information redundancy and loss of computing resources can be effectively balanced through the corresponding key frame selection strategy. At this moment, the binocular vision localization speed is low, but the update speed of laser RBPF-SLAM is relatively high. For the asynchrony of sensor speed, a method based on FG was adopted in this paper to realize MSIF.

3.1 FG fusion model

FG is a probabilistic graph model expressing the joint probability distribution of random variables and is composed of state nodes, factor nodes and lines. When the statexiis associated with the factorfi, a constraint edge exists between the state and the factor. The measurement model in robot localization can be expressed as

zi=hi(xi)+vi,

(10)

whereziis the measurement of the statexiwith the measurement functionhi(xi) by the sensor;viis the measurement noise. The optimization objective of single factor is obtained by

fi(xi)=d(hi(xi)-zi).

(11)

For the estimation problem that the error obeys Gaussian distribution, Mahalanobis distance is usually chosen as the representation form of its cost function:d(e)=eTΩe, whereΩis the inverse matrix of measurement noise covariance matrix. The whole FG model is the product of each factor as

(12)

where each factorfi(xi) represents the error function to be optimized; the optimal estimationx*of robot pose for information fusion can be obtained by minimizingFFG(x).

The laser/vision fusion model proposed herein is shown in Fig.4. The status output of laser RBPF-SLAM is throughout the entire localization phase. After receiving binocular vision observations, the factor nodefvisionis defined to expand the FG, so as to correct the localization results of laser RBPF-SLAM. Now the FG of laser RBPF-SLAM node and binocular vision node are described, respectively.

Fig.4 FG fusion model

3.1.1 Laser RBPF-SLAM node

According to the idea of improving RBPF-SLAM algorithm by nonlinear optimization method in Section 1.2, the error functione(xi,xj,zij) can be expressed as

eij=h(xi,xj)-zij.

(13)

In order to facilitate optimization and solution, the relation between two adjacent state nodes is expressed as follows according to Lie group

(14)

whereLijis the translational rotation matrix obtained by iterating the nearest point with a two-frame laser point cloud;TiandTjare the poses of current state nodesxiandxj. The laser RBPF-SLAM node is constructed as

(15)

3.1.2 Binocular vision node

It is assumed that the matching point set of adjacent key frames obtained by feature point extraction and matching is

(16)

The error term of thei-th counter point is

(17)

Then the least square problem is constructed. Singular value decomposition (SVD) is used to solve and minimize the error sum of squares

(18)

The translational rotation matrixVijbetween adjacent key frames can be obtained by calculating the inverse of optimizedRandtinverse transformation, then the binocular vision node is constructed as

(19)

3.2 Incremental optimization

The essence of FG optimization is to iteratively solve a least square problem: a certain initial valuex0is given to find out the increment Δxand minimizeFFG(x). In Gauss-Newton method, thef(x) first order Taylor is expanded, thus the increment Δxis converted to

(20)

The incremental equation is obtained by derivation

J(x)TJ(x)Δx=-J(x)Tf(x),

(21)

whereJ(x) is the Jacobian matrix off(x) at pointx. The matrixHcomposed ofJ(x)TJ(x) is sparse, the calculation can be accelerated by Schur elimination or Cholesky decomposition[20].

With the gradual increase of the factor nodes in the FG model, the update of all nodes should be recalculated whenever a new node is generated, which will lead to generous calculation and not conducive to system real-timeliness. By comparing the Jacobian matrix at different moments, the newly added factor nodes can be approximated as affecting only the adjacent nodes. In this way, only the affected nodes need to be considered whenever a node update is calculated, instead of optimizing the entire graph. On the basis of this idea, Kaess et al. conducted more elaborate processing on the FG, proposed incremental FG optimization, and improved the optimization process[21].

4 Experiment and analysis

In order to verify the effectiveness of the proposed MSIF algorithm, simulation test and experimental verification were carried out, respectively. In simulation test, the measurement process of laser and vision sensors was simulated, and robot localization results were compared with the real trajectory reference value to quantitatively evaluate the localization error of the improved method proposed herein. Besides, experiment verified the influence of MSIF algorithm on robot localization and backend mapping accuracy based on real mobile robot platform.

4.1 Simulation test

The software MATLAB was used to establish the motion model of the mobile robot controlled by two-wheel differential, and the robot trajectory is shown in Fig.5. The robot started from the origin of the coordinates. The robot movement was restricted to a 30 m×30 m square region centered at the origin. The odometer error was 0.05 m.

Fig.5 Robot trajectory set in simulation test

Observation marker were provided for the sensor by placing several stationary landmarks within the movement region. Landmarks were randomly placed at known positions. The measuring errors of laser and vision sensors were set to 0.1 m and 0.15 m, respectively, and the sampling period was set to 0.1 s and 0.5 s, respectively. The sensor range was set to 10 m.

The particles representing robot poses were initialized near the origin of the coordinates. 1 000 time steps were simulated for traditional RBPF-SLAM algorithm and the MSIF proposed herein. The localization error curve of robot posex1:1 1000(x,y,θ) obtained was shown in Fig.6. Compared with the ideal robot pose in simulation environment, the simulation results in the two groups had little difference in pose error (θerror). However, in position error (xerror &yerror), the localization results obtained by the algorithm proposed herein were significantly better, and the error was closer to 0.

Fig.6 Localization error curve of robot

4.2 Experimental verification

The crawler-type mobile robot platform used in the experiment is shown in Fig.7.

The robot body is driven forward by two driving wheel motors equipped with rotary encoders. The model of the single-wire laser ranging sensor is HOKUYO® URG-04LX, the distance measuring range of the sensor is 20-4 000 mm and the measuring error is ±1%, the angle measuring range is 240° and the measuring error is 0.36°. The binocular vision camera is Bumblebee2 of PointGrey Company and the resolution of a single-frame image is 640 px×480 px.

Fig.7 Crawler-type mobile robot platform

The experimental environment is shown in Fig.8. In the experiment, the robot trajectory was first set through movement control software; the driving wheel motor drove robot movement after receiving the corresponding instruction. Due to the control error of the motor itself and the cumulative error of dead reckoning method, it was difficult to obtain the true value of the robot trajectory. It can be seen from the mutual iterative relation between robot localization and mapping that the experimental group with more accurate localization results under the same conditions should also have better mapping effect. Therefore, the advantages and disadvantages of different localization algorithms can be reflected by comparing the environment description accuracy of the map model.

Fig.8 Experimental environment

In order to make the map model more intuitive, in this paper, the 3D point cloud map of experimental environment was constructed by referring to the mapping method in Ref.[10] and relying on the robot localization results of different algorithms. The control group adopted the laser/vision fusion algorithm proposed in Ref.[4]. The robot trajectory obtained by different localization algorithms is shown in Fig.9. The 3D point cloud map built is shown in Fig.10.

Fig.9 Robot trajectory in experimental scene

Fig.10 Visualization results of 3D point cloud map

In order to analyze the accuracy of environmental description of the map model, the 3D point cloud was projected to the coordinate axis direction of the environmental coordinate system (x,y,z). The results in Table 1 were obtained by measuring the typical spatial characteristics of 3D map model and comparing it with the measured value.

Table 1 Accuracy analysis of 3D point cloud map

There was little difference in shape and dimension scale between the 3D map model built herein and the actual experimental environment. The average error between the typical spatial characteristics of the map model and the measured value was only 1.27%, the accuracy was higher than that of the algorithm proposed in Ref.[4]. Besides, the algorithm herein has smaller standard deviation (SD) of the average error and more stable accuracy performance. Both the two algorithms adopted the method of separating the front end from the back end of localization and mapping, and used the same point cloud generation method. Therefore, the robot localization accuracy of the algorithm herein can be deemed higher than that of the algorithm proposed in Ref.[4].

5 Conclusion

This paper combined laser RBPF-SLAM and binocular vision, adopted FG node to describe the process of state recursion and robot system update, and proposed a laser/vision fusion localization algorithm based on the FG. The simulation results show that the algorithm herein has higher position accuracy. The localization and mapping experiment in actual scene verify that this algorithm has higher localization accuracy than general fusion localization algorithms, which improves the mapping accuracy and has certain practical value.