Non-parametric camera calibration method using single-axis rotational target

2022-04-18FULuhuaRENZeguangWANGPengSUNChangkuZHANGBaoshang

Journal of Measurement Science and Instrumentation 2022年1期

FU Luhua， REN Zeguang， WANG Peng， SUN Changku， ZHANG Baoshang

(1. State Key Laboratory of Precision Measurement Technology and Instruments， Tianjin University， Tianjin 300072， China；2. Science and Technology on Electro-Optic Control Laboratory， Luoyang Institute of Electro-Optic Equipment，Aviation Industry Corporation of China, Ltd.， Luoyang 471009， China)

Abstract： The ability to build an imaging process is crucial to vision measurement. The non-parametric imaging model describes an imaging process as a pixel cluster, in which each pixel is related to a spatial ray originated from an object point. However, a non-parametric model requires a sophisticated calculation process or high-cost devices to obtain a massive quantity of parameters. These disadvantages limit the application of camera models. Therefore, we propose a novel camera model calibration method based on a single-axis rotational target. The rotational vision target offers 3D control points with no need for detailed information of poses of the rotational target. Radial basis function (RBF) network is introduced to map 3D coordinates to 2D image coordinates. We subsequently derive the optimization formulization of imaging model parameters and compute the parameter from the given control points. The model is extended to adapt the stereo camera that is widely used in vision measurement. Experiments have been done to evaluate the performance of the proposed camera calibration method. The results show that the proposed method has superiority in accuracy and effectiveness in comparison with the traditional methods.

Key words： camera calibration； rotational target； non-parametric model

0 Introduction

Vision measurement is used to recover geometric information from the images captured by cameras.The recovery process is namely the imaging process that is described in the form of a mathematic model. The capability of establishing the model is determined by camera calibration procedure. The goal of camera calibration is to obtain optimal parameters for reliable mapping. Therefore, various camera models and calibration methods have been proposed.

The most widely used image model is the pinhole camera model. This model simplifies the imaging process as a linear system that projects the spatial points to the image coordinates via a pinhole. One reason accounting for its popularities is that only a few parameters are involved, including principle points, focal length, and transformation matrices. Parameters in negligible quantity result in easily used formulizations, among which a well-respected one is the two-stage calibration technique proposed by Tsai[1]. Following Tsai, Zhang improved Tsai’s method by introducing more manual data and homography matrix[2]. Further research has been done by Fitzgibbon, who proposed a division model with a rational function for general camera lens distortion[3]. Due to the development of deep learning, L′opez-Antequera recently has designed a new loss function based on point projections to jointly estimate camera parameters[4].

Albeit the availability and understandability of pinhole model calibration obtain a big success, some problems persist[5]. One of them is the pinhole assumption. In reality, an ideal pinhole camera does not exist and the imaging process cannot be regarded as a projection transformation in geometry optics. To compensate for the error of the transformation, distortion factors are defined as the polynomial form of the displacement to the principle point. However, the distortion has complex sources, which are likely to be radial, tangential, or independent from pixel to pixel[6]. Besides, the pinhole-based calibration method is not sophisticated enough to describe the imaging process at an angle of nearly 180 degrees[7]. For general cameras, several non-parametric models or ray-tracing models have been proposed. Peter[8]presented a general unconstrained non-parametric calibration concept regarding an image as a set of pixels that are associated with a ray traveling through 3D space. A parameter-free method determining the radial distortion was proposed by Hartley et al.[9]and added to the traditional Zhang’s calibration method. The limitation that all the rays intersect at a common optical center was added to the parameter-free camera model in Ref.[10]. Sun et al.[11]resorted translation stage with an indicator to recover 3D rays from intersections at different planes.

Generally speaking, the unconstrained non-parametric model show benefits to accommodate the irrelativity of different pixels. The price of calibration complexity is high in searching for a global minimum. Most non-parametric models require massive computation or sophisticated operations to obtain nonlinear parameters such as rotation matrices, translation matrices, and directions of rays. For this reason, any parameter-free camera calibration methods must be both accurate and easy to operate. In this study, we try to explore the calibration technique which combines the non-parametric camera model with an auxiliary device.

In the following sections, we first describe the non-parametric model and introduce its general form in Lie algebra. Subsequently, we derive the formalization of single-axis rotation using the rotation stage. Finally, the rotation averaging of observation values of two camera individual poses is exploited and the method is extended to a stereo camera rig. Experiments and conclusions are shown in the last part of the paper.

1 Non-parametric model and calibration

In this section, a precise camera calibration method is proposed for a distorted binocular stereo camera. This method is based on a non-parametric ray tracing imaging model with a turntable. In the general non-parametric model, every pixel of the camera sensor is related to a ray hitting on the imaging plane, as shown in Fig.1. From this aspect, the pixels of the camera sensor are essentially the indices of corresponding rays in 3D space. Along this line of the consideration, we specify an incident ray path by parameter equationλ(d(u,v),oi(u,v)) meaning that the ray passes through the pointoi(u,v) in the direction of the vectord(u,v). The pointoi(u,v) is called the endpoint in the following sections. Additionally, the direction vectord(u,v) is fitted from corresponding points in multiple target planes while the endpointoi(u,v) is selected randomly on the ray.

Fig.1 Non-parametric ray tracing image model

Each ray associated with a certain pixel on the image plane has six degrees of freedom, of whichoi(u,v) has three of them andd(u,v) has the other three. To estimate the functions of every ray, at least two spatial points are needed for recovering positions and orientations. To this end, the localization of 3D object space points requires a calibration target offering determined marks with precise spatial coordinates.

A spatial point could be provided by 2D control point coordinatePiin calibration board at the poseΘi. The camera captures several photos at different poses, as shown in Fig.2. One of the poses is selected as the datum target plane to align all the points to a common coordinates system.Piis transformed to the unified coordinatePeiaccording to the poseΘiwhich is from theith calibration target plane to the datum target plane. We give the mathematical expression as

Pei=ΘiPi,

Θi=[R|T],

(1)

wherePeiis called spatial control point in the following sections,Rdenotes the rotation matrix, andTis the translation matrix. Concerning each pixel in the camera shot, ray trace can be computed with spatial control points.

Fig.2 Ray recovered by spatial points

1.1 Establishment of mapping from image points to spatial points

To obtain the planar control pointPi, the mapping from image points to planar control points is in demand. The calibration target is applied for offering identifiable markers with known 3D coordinates. The mapping between 3D coordinates in the spatial calibration target surfaces and 2D point coordinates in the image plane could be easily established by implementing a radial basis function (RBF) network.

RBF network is an artificial neural network that uses radial basis functions as activation functions[12]. It has the capability of performing exact interpolation of a set of data points in a multidimensional space with a simpler structure than the classical regularization network[13]. Research has shown that the RBF kernel outperforms linear and polynomial kernel in accuracy[14]. Due to its generalization ability[14-16], it is suitable for high-precision applications. As shown in Fig.3, the RBF network typically contains three layers: an input layer, a hidden layer, and an output one.

Fig.3 RBF network layout

The input layer is a distributor of the input data while the output layer combines the results of the hidden layer linearly. Specifically, 2D pixel coordinates in the image plane are regarded as inputs while 3D spatial coordinates on calibration targets as outputs. The hidden layer takes Gaussian function as basis kernel, namely

φi(xi)=eβi‖xi-ui‖2,

(2)

wheres(x) denotes approximation function of RBF network whileX0is the result of the RBF network.φi(xi) denotes one element of input vectorx, andxidenotes one neuron function of the hidden layer. In the hidden layer,μicontrols the bias of the Gaussian function, and the spread parameterβidescribes the compactness of the basis function. The weights between the hidden layer and the output layer are modified by feeding data during the training process. In theory[16], if the number of hidden layer neurons is sufficient, any arbitrary function can be approximated to any degree of precision. Takingx1=Xandx2=Y, we can obtain interpolation points in the image planes based on the universal approximation of the derived network. Fig.4 shows the interpolated control pointPiof the calibration targets from the RBF network.

Fig.4 Process of points intersection by RBF network

1.2 Poses estimation with rotation stage

Planar point coordinatePiand target poseΘiyield the spatial control pointPei. In this section, we aim to getΘifor 3D control points coordinates. To simplify the complexity of solving transformation matrices, the calibration target is mounted on a rotation stage. In this circumstance, the movement of the target is converted into rotation around the axis of the rotation stage.

Initialization of poses of calibration target and poses of rays is needed first. For this proposal, a coarse estimation algorithm method is used to offer original target poses in each shot. The pose estimation determines the geometrical relationship of the target which is given by[17]

(3)

wherezcis the scalar factor;RandTare called the extrinsic parameters depicting the rotation and translation of the calibration target, respectively, andKis the intrinsic factor composed of the camera skew, focal length, and principal points. Letzwbe zero for the planar homogenous coordinates in its world coordinate system, then the factor matrices will degenerate into the homographyH, namely

(4)

wherer1,r2,r3andtare submatrices of the transformation matrixM; andzwis zero when the origin of the calibration coordinate system lies in the calibration board plane. Thus, the expression is degenerated into

(5)

Using the knowledge thatr1andr2are orthonormal, we can carry out a consequence and extract initial poses from extrinsic parameters. Then the coordinates of the control points can be computed from the planar coordinates.

In practice, perturbation occurs in both the axial direction and the tangential direction, resulting in uncertainty of the rotation axis. To reproduce the axis location and the coordinates of feature points on the target, the trajectory is derived by fitting the circle from the initial poses of the target, as shown in Fig.5.

Fig.5 Schematic diagram of trajectory projection

To determine the trajectory circle in the space, the normal vector of the circle plane is obtained by applying singular value decomposition to spatial points cluster matrix of the targets, which is derived by subtracting fitted circle center from primary point coordinates of the different poses, namely

(6)

The last row ofVrepresents the normal vectornand the equation of the plane can be obtained in the form of

Ax+By+Cz+D=0.

(7)

The general form[18]of a spatial circle can be expressed as

(x-x0)2+(y-y0)2+(z-z0)2=r2.

(8)

The center of the fitted circle is found at (x0,y0,z0). To apply the least-squares method, we expand the equation and rearrange the terms. The expanded equation of the circle can be expressed as

x2+y2+z2-2x0x-2y0y-2z0z=

(9)

Each point in the spatial circule trace must satisfy the equation above. Thus, we can rewrite the equations into the form of a matrix of an over-determined system.

An over-determined system is obtained for the least-squares fitting of a spatial circle as[18-19]

R=XK,

(10)

whereRandXare bothn-row matrix, in whichx1,y1andz1represent the first data point, whilexn,yn, andznrepresent the last data point in the data set.

(11)

Then any suitable optimization method can be used to get the radius and center coordinates of the circle. Since the normal vector and circle center of the single-axis rotation are known, new positions of the target planes are obtained by projecting original trajectory points onto the fitted circle as the form in Eq.(10).

New positions of the target planes take the circle center as the origin of the coordinate system. Thus, we can calculate the new transformation matrices of each target plane by

(12)

SinceΘiandPiare known,Peican be derived from Eq.(1).

1.3 Optimization of non-parametric camera model parameters

To recover the locus of rays hitting the camera sensor, the geometry can be formulated as the minimization problem, namely

(13)

whereD2denotes the metric space that defines the distance between the 3D control pointsPein the target plane with poseΘiand the rayλcorresponding to the 2D pixel coordinate (u,v). In the step of target poses estimation, only one target plane is updated while others’ positions are frozen, which avoids the difficulty to estimate large amounts of parameters simultaneously. The distance from the control point to the ray can be represented as the form of the distance between the 3D point and the foot pointPn. Consequently, the objective function is rewritten as

(14)

The method aims to get the poses of target planes and rays to minimize the loss of object function.Assuming that the poses of targets and rays are independent, we similarly apply the alternating algorithm which divides the optimization into two steps for the optimal.

The first step is the optimization of target posesΘiwhile keeping the poses of raysλ(u,v) fixed. The second step is to estimate the poses of rays while keeping the poses of target planes fixed. By processing the two steps iteratively, minimization of the objective function can be done explicitly.

Before the alternating optimization, the directiond(u,v) and endpointoi(u,v) of the ray should be obtained. We can fit the direction and endpoint atpi(u,v) by solving a least-square approximation once positions of 3D points of one 2D point are derived from initialization by feeding the data to the RBF network. This least-squares approximation problem can be presented as the following form that the closeness is measured by the square sum of the orthogonal distances between the ray and the points, namely

(15)

wheresdenotes thesth 3D point on the practical ray. Singular value decomposition (SVD) can be provided in linear regression. According to Ref.[20], the center of all the coordinates of points through which the fitted ray passes exactly can be considered as an endpoint. LetJdenoteos-Pes. The SVD ofJis formulated as

J=UΣV,

(16)

where matrixUrepresents the rotation of points. By extracting the first column of rotation matrixU, the direction of the ray can be described.

To deal with the object function, selecting L2 norm as the metric, the original function is converted to

argmin∑‖Pni(Θi,u,v)-Pe(Θi,u,v)‖2.

(17)

For simplicity, we omit pixel coordinate (u,v) and derive 3D coordinatePefromΘiandPo, and then we can get

Pe(Θi)=ΘiPoj,

(18)

where subscriptjmeans thejth ray. The error term is defined as the following form, which comes from the difference of the estimated locationPoand the control pointPe, namely

Qj=Pnj-ΘiPoj.

(19)

Our objective function can be expressed as

(20)

To minimizef(·), a derivation is needed for iteration of the optimization process. An exponential map in the Lie group SE(3) is applied to ensure transformation remains continuous in taking derivation[21]. Transformation of rotation and translation in a short period is denoted as

Θ=eε∧Θo≈(1+ε∧)Θo,

(21)

whereε∧denotes the skew-symmetric matrix related toε, and ∧ is an skew-symmetric operator associated with the Lie algebra for rotations and poses. For a vectorφ,φ∧means

(22)

whereφ∈R3,φ∧∈R3×3.

(23)

Substituting Eqs.(16) and (19) to the object function and taking the derivat of the objective function, we can get

(24)

where matrixForepresents the adjoint matrix of a transformation matrixTothat denotes the initial value of the transformation matrix, namely

(25)

To get an optimal soluation, the value of the first derivative to should be set to zero, then we get

(26)

where

(27)

andMis a constant matrix when the points are stationary in one plane representing the weights of quadric function. In addition, the determinants ofMandLare equal. We have

(28)

where1jis thejth column of the identity matrix;Pnis the center of the intersection points cluster;Pnjis an intersection point of theith ray andjth plane and can be determined by the direction vectordand the endpointoof ray and the control pointPo, which are expressed as

(29)

To get a unique solution forε, the determinant detLmust be non-zero which indicatesLis positive-definite. Once getting the objective function, the transformation matrix is simply updated with

(30)

The stop criteria of the estimation procedure areΘk=Θk-1. Alternatively, |Θk-Θk-1|<δcan be applied at the final iteration.

2 Application in stereo camera

2.1 Poses refinement for stereo camera

To determine the stereo model, stereo cameras simultaneously capture the calibration target of different poses at first. Since each ray can be easily obtained from multiple target planes, single camera models of stereo camera group are generated from ray traces to the pixels of the image sensor. After getting the single camera ray-tracing model, the proposed method is extended to stereo camera configuration easily. The camera pose estimation process of the stereo rig remains nearly the same as single camera calibration. But further work needs to be done to refine poses estimation of ray traces as control target plane poses differ slightly in each camera. Deviation of poses can be divided into translation error and rotation error. To deal with translation errors of target poses, translation averaging is applied to fusing poses of target planes observed by each single camera in a stereo rig.

Focusing on rotation error, we propose to apply rotation averaging to refine rotation matrices.The initialized rotation matrices are obtained using the method mentioned in the previous section. To deal with the deviation of two observers, the single-axis rotation problem can be described in the form of a geodesic L2 mean or the Karcher mean. In the presence of noise, the appropriate minimization problem is expressed as seeking the minimum, namely

(31)

wheredis a metric andpequals 1 or 2 for most circumstances. A necessary condition of Karcher means is given by

(32)

whereNis equal to 2. We use the algorithm which can provide a convergent solution by computing the average in the tangent space and mapping on the SO(3) space[23]. The results are offered to refine the transformation matrices in section 1.2. The parameters refinement of target poses is performed to reduce errors in rotation and translation by averaging the poses.

2.2 Computation for object point coordinates

In the vision measurement task, if the registration of object points has been done, two rays corresponding to one object point can be recovered from the non-parametric model. Theoretically, the spatial coordinates of the object points must be the intersection of two rays, which means the two rays laying in the same plane. However, the rays are more likely to be in the different planes. So we substitute the midpoint of the common perpendicular of the two rays for the real localization of the intersection. The midpoints can be considered as the one which minimizes the sum of the distances to both rays. The formulization is shown as

(33)

where

(34)

The minimum ofPfor the function can be derived when the deviation is at its maximum. Consequently, the optimal point is easily computed by

(35)

where + denotes the Moore-Penrose pseudoinverse. Since the midpoint is finally known, the object point is obtained.

3 Experimental results and analysis

To evaluate the performance, the proposed approach was performed with a test setup, including two HikiVision grayscale cameras with a focal length of 16 mm and a resolution of 1 280 × 1 024 pixels. The stereo cameras rig was fixed on a steel rigid base with a baseline of 210 mm length. The calibration target board illuminated by the infrared backlight source was a glass-made planar chessboard target which has 15 rows, 13 columns, and 195 corners in total. The integrated target set at the distance of 350 mm was fixed with the precision rotation stage Zolix RSM82-1A, which adopted coarse and fine adjustment control, with 360° scale on the circumference and the minimum resolution of Vernier down to 2′. This test setup was applied for both single-axis rotation calibration and measurement evaluation.

Fig.6 Experimental system for binocular stereo camera calibration

Calibration methods including the baseline method and our proposed method were tested on the same dataset. The baseline method based on the pinhole imaging model is a combination of Zhang’s technique and Brown’s distortion model. We compared the performance of different calibration methods and assessed the ability of 3D measurement over several experiments.

3.1 Evaluation of single camera calibration

The stereo cameras have been calibrated over a dataset of 10 pictures that contained ones from both left and right sight. These calibration target pictures covered the field of view as much as possible. For each picture, the calibration target was set at a random pose at the distance of approximately 620 mm and the angle ranging from -25 ℃ to 20 ℃ for the normal vector of the image plane.

To report the performance of single camera calibration, two indicators are measured, of which one is the measurement error of RBF. In each image, the RBF network is trained by extracting feature points and fitting the mapping function of their spatial positions. The measurement error, calculated as the mean square error of the geometric deviation between ground truth and computed points, represents the capability of the RBF network. In the experiment, four auxiliary markers were set up on the corner of the calibration target. The distance of any two markers on the same side is 160 mm, which offers a reference to evaluate the generalization ability of the RBF network.

The selection of RBF centers and spread parameters is critical to implementation of RBF network. For the RBF network used in our method, and centers are determined byk-means clustering on all the training examples. The spread controls the width of an area in the input space to which each neuron in the hidden layer responds. An improper spread might generalize the network well. However, it is difficult to find an approach for searching for the optimal spread to obtain a well-performed network[14]. We generate 700 different networks with spread varying from 1 000 to 7 000 and evaluate the performance of networks.

(a) MSE

(b) Error

The plot in Fig.7(a) shows that the mean square error varies with the spread of the RBF network. In the training process, the errors are measured by comparing the real coordinates and the outputs. In the evaluation process, the errors come from phsical coordinates of which the unit is millimeter. As the selected spread of the network increases, the reprojection error decreases and converges to around 0.073 at the spread of approximately 6 000. This result is most likely due to the enhancing capability of building mappings offered by the increasing spread. The plot in Fig.7(b) shows the downward trend of the marker measurement error of markers. Similarly, the error does not fall significantly when the spread value is over 5 000. Note that both reprojection error and the marker measurement error converge at a certain spread. No advantage can be seen in adopting a larger spread since control points cannot provide marginal information for more precise estimation. According to the suggestion that the higher value of the spread parameter should ensure more generalized results[14], we select 6 000 as the spread parameter in the following experiments.

Another indicator of the performance of calibration is the pose estimation error. The poses of the target shot in the calibration process are obtained from the optimization alternatively. To assess the advantages of the pose refinement process, we compare the measured results in pose estimation with ground truth measured by the scale indicator of the turntable. Fig.8 plots the single-axis rotation angles of poses with our method and the values merely with the baseline method. The horizontal axis denotes the angles of rotation stage, and the vertical axis denotes the error of different pose estimation methods.

Fig.8 Comparison of pose estimation methods of calibration target

We can see that the baseline method with the alternative optimization method exhibits lower error in all the picture shots. The most telling feature of the figure is that either the baseline method or the proposed method shares a similar trend. The main factor leading the trend may be the quality of the captured pictures. The harsh quality of photo shots at certain angles deteriorates the measured results during the calibration process. This may serve to explain, at least in part, the akin image of the two lines.

3.2 Evaluation of stereo camera localization

To assess the quality of the calibration, stereo cameras captured the pictures of feature points on the test setup. The test setup consists of a well-designed glass vision target offering a total of 63 directional circle markers with 8 mm as a small circle radius, 10 mm as a big large circle radius, and 12 mm as an interval, as shown in Fig.9. The target is manufactured by photoetching with details as fine as 1 micrometer across on 160 mm×160 mm glass substrate.

Fig.9 Vision target

Fig.10 exhibits the point cloud measured by stereo camera setup in the experiment. Each axis of the figure denotes one axis of the world coordinate system. Apparently, Fig.11 shows that our proposed method outperformes the baseline method.

Fig.10 Measured point cloud

Fig.11 Measurement error over different distances

The unconstraint ray-tracing method had an average error of 0.037 mm, which is lower than that of the pinhole model due to the ability to produce precise interpolations. The variance of measurement by the ray-tracing method is also smaller than that of the baseline model with a maximum of 0.017 mm. The massive quantity of free parameters in the unconstraint model likely provides accurate and subtle information. Notably, the graph reflects that there is a correlation between working distance and measurement error. Measurement error drops dramatically as the distance increases and reaches a trough at around 330 mm. Then the error rises as the working distance reaches the peak and continues to increase. Coupled with graphic information, one possible conclusion could be that the working distance appears to be close to the distance between the calibration target and the stereo camera in the calibration process. In other words, when the working distance is nearly the calibration distance, measurement precision may be higher.

4 Conclusions

This paper presents a novel calibration method to describe the stereo camera as a non-parametric model which maps 3D rays to 2D pixel coordinates. By adoptingthe rotation stage to provide various poses of calibration, this technique combines the simplicity of single-axis transformation and the accuracy of the parameter-free model. RBF networks are deployed for building up the relationship between pose picture shots and spatial points. Lie algebra is introduced to recover the geometry of rays. Furthermore, the unconstrained model is extended to a stereo camera, in which deviation of poses observed by cameras is dealt with rotation averaging. Specially-designed experiments are carried out to validate the accuracy of our proposed calibration method compared with the baseline approach in both single camera and stereo camera setup. A practical vision measurement task is demonstrated on an LED vision target to show the effectiveness of the non-parametric single-axis technique. We believe our work will contribute to the development of the camera calibration method. In the future, more research will be launched to stabilize the performance of the non-parametric calibration method.

Journal of Measurement Science and Instrumentation

2022年1期