An improved ViBe algorithm based on adaptive detection of moving targets

2020-04-28WANGWeiWANGXiaopengLIANGJincheng

Journal of Measurement Science and Instrumentation 2020年2期

WANG Wei, WANG Xiao-peng, LIANG Jin-cheng

(School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China)

Abstract： There exists a Ghost region in the detection result of the traditional visual background extraction (ViBe) algorithm, and the foreground extraction is prone to false detection or missed detection due to environmental changes. Therefore, an improved ViBe algorithm based on adaptive detection of moving targets was proposed. Firstly, in the background model initialization process, the real background could be obtained by setting adjusting parameters in mean background modeling, and the ViBe background model was initialized by using the background. Secondly, in the foreground detection process, an adaptive radius threshold was introduced according to the scene change to adaptively detect the foreground. Finally, mathematical morphological close operation was used to fill the holes in the detection results. The experimental results show that the improved method can effectively suppress the Ghost region and detect the foreground target more completely under the condition of environmental changes. Compared with the traditional ViBe algorithm, the detection accuracy is improved by more than 10%, the false detection rate and the missed detection rate are reduced by 20% and 7% respectively. In addition, the improved method satisfies the real-time requirements.

Key words： visual background extraction (ViBe); Ghost region; background model; adaptive radius threshold

0 Introduction

Moving target detection[1]is to separate the foreground target in video sequence from the complex background, which is the premise of target tracking, target recognition and target behavior analysis[2]. At present, the commonly used detection methods for moving targets include the frame difference method[3], optical flow method[4]and background subtraction method[5-7], etc. Among them, the inter-frame difference method divides the moving target by performing the difference operation between two frames, which is fast and easy to generate large-area holes and suitable for simple scenes. The optical flow method detects moving targets according to the changes in optical flow field, the detection precision is high, the target calculation amount is large, the anti-noise performance is poor, and the processing speed depends on the hardware device. The main idea of the background subtraction method is to initialize a background model, get the foreground by comparing each frame of video with the background model and update the background model. The algorithm is relatively simple, with a small amount of calculation and high real-time performance. However, in most complex scenes, the background subtraction method is difficult to get reliable background, and the detection efficiency is low in dynamic scenes. To this end, the Gaussian mixture model (GMM)[8]and the visual background extraction (ViBe)[9-11]algorithm have been proposed by scholars. GMM has good adaptability to complex scenes, but it needs to calculate model parameters to adjust the background model. The ViBe algorithm completes the initialization of the background model in the first frame of the video sequence, with small computational complexity, high detection accuracy, good real-time performance and low memory usage[12]. However, when there is a moving object in the first frame of the video sequence, the Ghost phenomenon will occur. The Ghost region can only be eliminated after a long time background update, and all the parameter factors in this algorithm are empirical values, which may lead to false detection or missed detection in high dynamic scenes. In order to solve the Ghost region problem in the ViBe algorithm, Ref.[13] initializes the ViBe background model with the firstnframes of video to quickly eliminate the Ghost region. Ref.[14] accelerates the elimination of the Ghost region by expanding the selection range of sample neighborhood and reducing the update factor. Ref.[15] proposed a method based on foreground and neighborhood background pixel histogram similarity matching to quickly detect and suppress the Ghost region by updating the background model. In view of the problem that ViBe algorithm does not adapt to high dynamic scenes, Ref.[16] introduces the Otsu algorithm to change the original fixed threshold into a dynamic threshold according to the pixel change, which improves the adaptability to high dynamic scenes.

In view of Ghost region in the detection results of ViBe algorithm and problems such as false detection or missed detection caused by environmental changes, an improved ViBe algorithm based on adaptive detection of moving targets is proposed. The real background is obtained by improving the mean background modeling, and the Ghost region is suppressed by initializing the ViBe background model. According to the degree of scene change, the adaptive factor is introduced to adaptively assign the radius threshold to adapt to the environmental change. Finally, the results are processed by morphological close operation to make the target more complete.

1 ViBe algorithm

ViBe algorithm is a pixel-level foreground detection algorithm[17]. Its basic idea is to initialize the background model of each pixel, that is, to set background samples, and then determine whether it is a foreground pixel or a background pixel by comparing the current pixel with the background sample. If the current pixel is a background pixel, the background sample can be randomly updated with the current pixel. At the same time, the neighborhood pixel background sample is updated randomly through the current pixel.

1.1 Initialization of background model

ViBe algorithm initializes the background model based on the first frame of video. The background model establishes a sample setM(x) with sizeNfor each pixel, andM(x) is expressed by

M(x)={v1,v2,…,vi,…,vN}

vi∈Πx,i∈[1,N],

(1)

whereviis the random sampling value of 8-neighborhood of the pixelx. The process of establishing sample setM(x) is shown in Fig.1.

Fig.1 ViBe background model

1.2 Foreground detection

Foreground detection is the process of comparing the measured pixel with the sample set in the background model to determine whether the pixel is a foreground from the second frame of the video.

As shown in Fig.2, under the (C1,C2) component of the two-dimensional (2D) Euclidean color space, a sphere regionSR(v(x)) with a pixel valuev(x) of the current pixel pointxas center andRas radius is defined. The number of intersection points betweenM(x) andSR(v(x))) is counted.M#represents the similarity betweenv(x) andM(x). IfM#is less than the minimum number of matchesth, thenv(x) is the foreground. And vice versa.

Fig.2 Schematic diagram of 2D Euclidean color space classification

The similarityM#betweenM(x) andSR(v(x)) can be expressed as

M#={SR(v(x))∩M(x)}.

(2)

The binarization result of foreground detection can be expressed as

(3)

1.3 Update of background model

The update of background model uses a random update mechanism. The pixel judged as the background has a probability of 1/φ(φis the update factor) to update its corresponding background model and the background model of the neighboring point is updated with the probability of 1/φusing the spatial propagation of the pixel.

1.4 Problems of ViBe algorithm

Because the ViBe algorithm initializes the background model with the initial frame of video and uses a fixed threshold in the foreground detection, the algorithm has the advantages of simplicity and small computational complexity, but the following disadvantages are caused under such conditions:

1) Ghost region. Ghost region refers to the foreground region which does not correspond to the real moving target. Ghost region formed in ViBe algorithm is due to the presence of moving targets in the first frame of image, but in the process of background modeling, the moving target is used as background to initialize the background model, so Ghost region is formed in this case.

2) False detection or missed detection. False detection or missed detection occurs because the threshold values are all empirical values. The small threshold has good robustness to the detection effect in static scenes. However, in complex dynamic scenes, small thresholds can cause false detection and large thresholds can cause missed detection.

2 Improvation of ViBe algorithm

Because the first frame of the captured video may contain moving targets, if the background model is established with the first frame, a Ghost region will appear in the subsequent detection process. Moreover, since the threshold of ViBe algorithm is an empirical value, it can only cope with a static scene or a slowly changing scene, and does not adapt to high dynamic scene. Therefore, the initialization of the background model and the threshold selection are improved for the above disadvantages.

2.1 Initialization of background model

In order to suppress the Ghost region, a real background model needs to be built. The mean background modeling method is usually adopted, which establishes a background model by averaging each pixel point in theMframes video sequence. The method is simple, but the large number of selected frames will reduce the computational efficiency, and the small number of selected frame will cause a lot of noise in the background. In order to reduce the computational efficiency and obtain a real and reliable background, an adjustment parameterδis set to improve the mean background modeling, and the improved method is used to get the background model for initializing the ViBe background model. The specific implementation process is as follows:

(4)

Secondly, the average differenceηof the pixel valuefi(x) of the pixel pointxin theMframes image is calculated by

(5)

The noise points with large differences are filterred out by setting the adjustment parameterδ， namely

(6)

The denoised pixel valuefj(x) is used to initialize background modelB， namely

(7)

wherefi(x) represents the pixel value of the pixel pointxin thei-th frame image.

Finally, a sample setM(x) with sizeNis created for the pixel pointxin the initialized background model.

2.2 Adaptive radius threshold foreground segmentation

In ViBe algorithm, each pixel point adopts the same radius thresholdR, but cannot adapt to the moving target detection of the dynamic scene. Generally, a largeRis required for a high dynamic scene region (such as interference of leaves swaying in the wind and fluctuating water surface) to prevent false detection, and a smallerRis required for a static scene region to prevent missed detection. Therefore, in order to adapt to the complex detection scene,Rwill be adaptively adjusted according to the complexity of the scene in this paper.

Firstly, the adaptive factorDis defined to illustrate the complexity of the scene as

(8)

wheref(x) is the pixel value of the current frame pixel pointx;vkis the element of the background sample set of pixelx;ωis the scale factor ofD. The smaller theDvalue is, the less the scene changes. And vice versa.

Secondly,Ris adaptively adjusted according to the change ofDas

(9)

whereζis the variation parameter ofR. It can be known from Eq.(9) that when a static scene occurs,Rwill gradually become smaller; and when the dynamic scene appears,Rwill gradually become larger. However, in order to maintain the robustness of detection, an upper and lower threshold should be set forR, soR∈[12,28] is obtained according to a large number of experiments.

Finally, after the initialization ofRis completed, the foreground segmentation is performed according to the foreground segmentation method of the traditional ViBe algorithm as

(10)

whereF(x) is the binarization result of the foreground segmentation;dis(·) is the Euclidean distance; andthis the minimum number of matches.

2.3 Post-processing

Due to the existence of holes and other incomplete phenomena in the foreground target detected by the improved ViBe algorithm, mathematical morphological close operation[18]is adopted to process binary imagesF(x) of moving targets as

F(x)·c=(F(x)⊗c)⊕c,

(11)

wherecdenotes structural element; · denotes close operation; ⊗ denotes dilation operation; and ⊕ denotes erosion operation.

2.4 Algorithm implementation process

The algorithm flow is shown in Fig.3. Firstly, the background is constructed by using the mean background modeling method for setting the adjustment parameters, and the ViBe background model is initialized by using the constructed background. The radius threshold is adaptively set according to the complexity of the scene of the current frame, and the Euclidean distance between the current frame pixels and its background sample set element is calculated. Then determine whether the pixel is a foreground pixel. If it is a foreground pixel, it is binarized, and if it is not a foreground pixel, the background model is updated. Finally, a morphological close operation is performed on the formed binary image, and a complete foreground image is output.

Fig.3 Flow chart of detection process

3 Experimental results and analysis

The Inter(R) Core(TM) i5-4200M CPU @ 2.50 GHz/memory 4.00 GB hardware platform was used in the experiment, and the simulation experiment was carried out under the MTALAB R2012b environment. Algorithm parameter settings are as follows: the video frameM=20 for background modeling, the adjustment parameterδ=2, the size of sample setM(x) isN=20, the scale factor of the adaptive factorDisω=5 , the variation parameter of the radius thresholdRisζ=1, and a circular structure element with the radius isc=1. Select the video provided by the change detection.net[19]video library and the self-shooting test video. The test video resolution is 320×240. In the same environment, the proposed algorithm is compared with GMM, frame difference method and ViBe algorithm. The parameter setting is the same as the cited literature.

3.1 Ghost region suppression experiment

Fig.4(a) shows the background of the ViBe algorithm established with the first frame of the video. Fig.4(b) and 4(c) show the background established by the mean background modeling method and proposed algorithm in the first 20 frames of the video. It can be seen that Fig.4(a) takes the first frame of video containing vehicles as the background, and the background established in Fig.4(b) has no vehicle but with a little noise, while Fig.4(c) establishes a more reliable background.

Fig.4 Initial background established by three algorithms

The background model of ViBe algorithm was initialized by using the background established in Fig.4, and the Ghost region suppression experiment was performed by video Highway_raw, and the test results of the 10th, 22th and 53th frames of the video Highway_raw were used for comparative analysis, as shown in Fig.5. It can be seen that the results of ViBe algorithm in 10th frame contain the Ghost region (calibrated by white box), and the Ghost region disappears slowly with the background updating in 22th and 53th frames, but it does not disappear completely. Fig.5(c) and Fig.5(d) show that both the mean background modeling method and the improved mean background modeling method can directly suppress the Ghost region, but the comparison of Fig.5(c) and Fig.5(d) shows that both the 22th and 53th frames of Fig.5(c) have noise points (shown in the white circle), while the result of Fig.5(d) is slightly better. Therefore, the proposed algorithm is more effective.

Fig.5 Video Highway_raw test results

3.2 Adaptive radius threshold experiment

In order to verify the adaptability of the proposed algorithm in complex scenes, the video Overpass was selected for testing, and the 5th, 487th and 2 779th frames were compared with the ViBe algorithm. The results are shown in Fig.6. As seen from the Fig.6, the ViBe algorithm misdetects the water ripples and swaying leaves in the background of the video as foreground, and in the 2 779th frame detection of the car and pedestrians, there is a phenomenon of missed detection. With this proposed algorithm, the radius thresholdRwas adaptively updated with the environment changes. Therefore, the car and pedestrian were effectively detected in the 487th and 2 779th frames.

Fig.6 Video Overpass test results

3.3 Algorithm performance experiment

In order to evaluate the comprehensive performance of the proposed algorithm, the method was compared with GMM, frame difference method and ViBe algorithm. The comparison result is shown in Fig.7.

Fig.7 Test results of four algorithms in different complex scenes

Fig.7 shows that GMM and ViBe algorithm can detect moving target area more completely, but GMM has larger false detection area, while ViBe algorithm has smaller false detection area, and both algorithms have slight missed detection of the target. The frame difference method has the worst detection effect, not only the area of missed detection is large, but also false detection appears in windy weather and shaking leaves in test vedio House. In several video detection, the proposed algorithm can reflect better performance of moving target detection, and there is no obvious false detection and missed detection.

The quantitative analysis of the performance of the proposed algorithm, GMM, frame difference method and ViBe algorithm is mainly reflected by the accuracy rateRACC, false detection rateRFPR(false positive rate), missed detection rateRFNR(false negative rate)and average processing time per frame, the specific indicators are defined as

(12)

(13)

(14)

wherePTPis the number of foreground pixels that are correctly detected;PFPis the number of background pixels that are falsely detected as foreground;PTNis the number of background pixels that are correctly detected; andPFNis the number of foreground pixel false detections as the background. The detection performance indicators of the four algorithms are shown in Tables 1-3. It can be seen from Table 1 that under different scene conditions, the detection results of GMM and ViBe algorithm have a high accuracy rate, and the accuracy rate of frame difference method reduces due to the occurrence of large area holes during detection. The proposed algorithm has a higher detection accuracy rate than the other three detection algorithms. Table 2 shows that the false detection rates of GMM and ViBe algorithm are high due to false detection in the detection process, and the false detection rate of frame difference method is relatively low, while the proposed algorithm has the lowest false detection rate compared with the other three detection algorithms. As seen from Table 3, the missed detection rates of GMM and ViBe algorithm are slightly lower than that of the frame difference method. The frame difference method has a high missed detection rate due to the occurrence of large area holes in the detection process, while the proposed algorithm has a lower missed detection rate compared with the other three detection algorithms after morphological close operation on the detection results.

Table 1 Comparison of accuracy rate RACC

Table 2 Comparison of false detection rate RFPR

Table 3 Comparison of missed detection rate RFNR

The comparison of average processing time per frame of the four algorithms is shown in Fig.8.

Fig.8 Comparison of average processing time per frame of four algorithms

In the Fig.8, the average processing time of the proposed algorithm for each frame of four test video is slightly less than that of GMM, but more than that of frame difference method and ViBe algorithm. This is because the complexity of the proposed algorithm is higher than that of the frame difference method and ViBe algorithm, but it still meets the real-time requirements.

4 Conclusion

The ViBe algorithm establishes a background model based on the first frame of the video and segments the foreground with a fixed radius threshold. When the moving target is contained in the first frame, Ghost region will appear in the foreground, and in the case of high dynamic scene, the fixed radius threshold can not effectively segment the foreground. Aiming at the above problems, an improved ViBe method based on adaptive detection of moving objects was proposed. When the background model was initialized, real background was obtained by setting adjusting parameters in mean background modeling method, and ViBe background model was initialized by using this background. In the foreground detection, the radius threshold was adaptively assigned according to the complexity of the scene. Mathematical morphological close operation was performed on the holes present in the test results. Through experimental verification and quantitative analysis, the proposed algorithm can effectively suppress the Ghost region and completely detect the foreground target in the case of environmental changes.

Journal of Measurement Science and Instrumentation

2020年2期