APP下载

A new skeleton based flying bird detection method for low-altitude air traffic management

2018-12-15TinhngWUXioynLUOQunyuXU

CHINESE JOURNAL OF AERONAUTICS 2018年11期

Tinhng WU,Xioyn LUO,Qunyu XU

aSchool of Electronic and Information Engineering,Beihang University,Beijing 100083,China

bImage Processing Center,School of Astronautics,Beihang University,Beijing 100083,China

cDepartment of Civil and Environmental Engineering,University of Illinois,Urbana,IL 61801,USA

KEYWORDS Bird-aircraft collisions;Flight security;Flying bird detection;Low-Altitude Air Traffic Management(LAATM);Simplified skeleton descriptor

Abstract In low-altitude air traffic management,non-cooperation targets are the greatest threat to security of low- flying aircraft.Among various aviation fatalities, flying bird is the main factor with the highest risk and directs economic losses amounted to nearly 10 billion US dollars each year.Therefore,Flying Bird Detection(FBD)has attracted considerable attention in low-altitude air traffic management.In this paper,we propose a skeleton based FBD method via describing bird motion information with a set of key poses.To overcome the variability of birds,the skeleton feature is selected as a relatively fixed and common characteristic for the pose appearance of flying bird.Based on the geometric topology among some key parts of bird body,a set of key poses can be described by some extracted skeleton features,which are used to represent the bird motion information.Aimed at robustly handling with the pose variations,multiple pose-specific classifiers are individually trained to learn the representative poses of the flying bird.At the detection stage,the flying bird skeleton features are combined with extracted key-pose sets to perform the flying bird classification task from each image.Afterwards,the key-frame pose-change set and the consistency of the classification results from sequent images are employed to validate the final detection results.Experiments on flying bird datasets demonstrate the effectiveness and efficiency of the proposed method.

1.Introduction

1.1.Related work

With the open of low-altitude airspace in recent years,Low-Altitude Air Traffic Management(LAATM)pays more attention to the security problem of low-altitude flight.Obviously,the uncertain obstacles and complex environment increase the difficulty to guarantee the safety in LAATM.In order to assure the flight route safety,LAATM must not only obtain accurate navigation signals,but also detect various obstacles,such as birds,wire,and building.Statistics from the Federal Aviation Administration(FAA)show that there are nearly 10000 times bird strikes around the world,and the direct economic losses amount to as much as 10 billion dollars each year.1Therefore,Flying Bird Detection(FBD)is an important yet challenging problem to reduce the risk of bird strikes for LAATM.

Over the past 40 years,airport-based avian radar systems were an interdisciplinary subject attracting interests of researchers and consultants over the world.Some excellent products have been developed and applied in many civil or military airports,for instance,Merlin2and AHAS system3in America,Accipiter system in Canada,4and BIRDTAM system in Europe.5As a complex system who refers to technology of multi- field, flying bird detection system is greatly affected by types of sensor.The domestic and foreign analyses mainly use radar,sound wave,optical method,infrared,etc.Compared with various sensors,we think visual inspection possesses unique and irreplaceable advantages,such as low cost and convenient maintenance.The most important advantage is that this method can provide a wealth of information like bird outline,texture or color.It is beneficial to differentiate birds from other targets.

In general,the existing FBD methods fall into the category of typical moving object detection problems.Compared with traditional moving object detection such as car detection,FBD is much more challenging because of the deformation of bird.This deformation is embodied into two aspects.On the one hand,the appearance of birds is diversified.It is reported that there are about 9000 bird species with different sizes,shapes,colors,textures,etc.6On the other hand,since the flying bird is non-rigid and highly dynamic during the flight(e.g.up-stroke,down-stroke,bounded),it often undergoes larger-scale pose changes which bring difficulty to describe motion characteristics of birds.Thereby,the complexity of flying bird appearance dramatically increases the detection difficulty.Additionally,the detection approach has strict requirements in terms of real-time and accuracy in the low altitude of safe flying.

To address the problem of FBD,a variety of vision approaches have been proposed in recent years.Broadly speaking,they can be roughly classified into two categories based on image features and motion characteristics.

In the first category,methods directly utilize the well known image features to perform detection.Primarily,the bird size,7body axis length8or shape information9are used in the appearance-based FBD.Unfortunately,such information varies greatly among different bird species,which adds up the difficulty for reliable detection.In addition,merely relying on appearance is not enough to distinguish those non-bird targets with high similarity to flying bird.As a result,massive false positives and false negatives can occur when this set of methods are applied to FBD.

By contrast,the other category is based on the flying birds’own appearance or motion characteristics to perform the FBD.These methods usually impose relatively strong constraints on the bird motion pattern.10,11This requirement results in a lack of capability to detect flying bird with abrupt or non-periodical motions,which limits its usage.Meanwhile,more than often a large number of image frames are needed by these methods to analyze the accurate trajectories or the periodicity of the movement,which inevitably slow down the whole detection speed.

In order to overcome the disadvantages of the above appearance or motion based methods,some excellent related work proposed a method based on the combination of the appearance features with spatial structure.12For flying bird detection problem,it is a good way to incorporate the flying bird shape and shape dynamic information in the point of vision processing.

Visual observation and analysis of small flying animals(e.g.bats,bees,etc.)using computer-vision methods have been an increasingly active research direction.11–24To date,various vision-based techniques have been applied to detecting flying bird.Motion-detection is a most commonly used method,since flying bird is a typical moving object,for instance,temporal differencing,18,19median based background subtraction20and optical flow analysis.7,21However,different from other moving objects,there are many specific features of birds which will lead higher false positives in practical conditions.Hence,some methods that further utilize bird-specific features(shape,color or texture)have been developed to solve this problem better,such as Haar-like,1,25Histogram of Oriented Gradients(HOG)and Local Binary Patterns(LBP).26These methods work well for the detection of birds in walking or perching stance.Xu and Shi27have described a simple method based on a simplified bird skeleton descriptor.This method uses four key points to represent the four key parts of bird body:the head,the tail and the two wings.A flying bird skeleton comprises these key points and the bird body center point.Based on the geometrical features,the advantages of this description are low dimension and computationally inexpensive.However,such low-level features,mostly computed on local statistics,are not discriminative enough for characterizing the flying bird with varied appearances and their changing flying poses.

In recent years,there are some methods combining appearance with motion.For example,Zhang et al.28have introduced a novel hierarchical feature model exploiting shape and shape dynamics to improve the ability of representing a flying bird,and then applied it to the FBD problem.At the lower level of this model,a shape context based feature descriptor is introduced to capture the spatial relations in bird body shape.Then the shape descriptor is extended into the spatio-temporal domain and a shape dynamics based representation is built at the higher level of the hierarchical framework.Nevertheless,their method is one kind of coarse detection when the appearance feature is extracted at the lower level.This method needs to train a large number of classifiers in order to ensure the accuracy which results in a heavy workload.And they cannot guarantee that their classifier is complete for the diversity of birds.On the other hand,using the distribution of all contour points to represent a flying bird may need a large number of samples and result in a vast feature computation,which impairs the detection accuracy and speed.At the high level,they utilized a simply Markov process to describe the shape changes of birds.This method requires multi-frame image sequences to extract the features of birds’continuous changes.Actually,bird’s information may be incomplete in some situations,and this can also cause failure when the long Markov chain is developed.Therefore,the continuous shape change of the flying bird can be only obtained with a number of image frames,which prolongs the motion feature extraction time.Meanwhile,relying on template matching at the detection stage limits its robustness to pose variations.

To tackle the problem of detecting flying bird with wide appearance variations due to different species and poses,we combine appearance feature with motion feature and present a classification-based approach to FBD in this paper.

1.2.Overview and contributions

This approach focuses on using the flying bird skeleton features to represent the flying pose,and describing the motion with these key poses.To fulfill the classification task on each image,we design a set of multiple pose-specific classifiers.Afterwards,to achieve greater robustness,the key-frame pose change information and the consistency of the classification results from images are utilized to validate the final detection result.

The contributions of this paper are summarized as follows:

(1)A novel skeleton-based flying bird representation framework is proposed on geometric topology among key bird body parts.This underlying topological distribution property of the target is well-adaptive to bird appearance variations.Moreover,the skeleton-based representation greatly alleviates the computation overhead for its low dimensional geometric structure.

(2)A key-frame pose-change set is established on top of the skeleton-based feature representation.According to that,bird movements can be summarized as a sequence of bird poses,29and the proposed pose-change set is well suitable to characterize the typical motions.Relying on temporal analysis of the entire motion sequence,our key-pose-based strategy significantly simplifies the feature extraction procedure and greatly boosts up the target detection speed.

(3)A set of pose-specific classifiers are deliberately trained to further strengthen our result in enduring the extreme appearance changes of bird targets.Each classifier within the set aims at detecting one particular pose of bird.With the help of multiple classifiers’adaptive combination,we manage to achieve satisfactory classification performance.

2.Approach

2.1.Motivation

This paper develops a new and efficient classification-based approach which combines dynamic movement information with static skeleton feature to handle the FBD issues.The motivation of this paper includes the following threefold:

Firstly,as shapes and species of birds are diversified,it is difficult to detect bird-like targets by extracting the appearance features on the spatial distribution28,30among all contour points.Adopting local information feature31is a good method to describe bird appearance.As stated in Ref.32it is well recognized that,for flying bird,the skeletal structure of most possess is highly similar.Therefore,we only select some representative contour points,each of which represents a key body part,and then collect them towards the body center point to construct one flying bird skeleton point.Since item phases the underlying topological distribution properties of the shape of flying bird,it can adapt itself to the changes of appearance across different species as well as a certain pose change.At the same time,the low dimensional geometric structure relieves the computation expense.

Secondly,we seize to extract the motion information on top of the skeleton representation.It is fairly difficult for the flying bird target to gather complete flight information with varying observation distance,view angle and variable flight motions.To solve this problem,we divide the time line into temporal segments,so that the detection can be eased and sped up.The existing researches show that bird motions conform to certain patterns,29like flapping,bounding flap-gliding,etc.and each of these movements itself is also the combination of a series of ordered key-poses.To this end,we managed to represent these representative movements with an action set,which is constructed in a bottom-up fashion from bird key-pose sets.At test time,the observed objects are matched to the action set.Once there is a match,the object is defined as a bird.

Thirdly,to deal with the extreme case where the observed object does not match with the action sets because of outliers or spurious observation,we deliberately train a set of classifiers for each representative pose class and learn to recognize the particular pose pattern of flying bird.During detection,we extract bird’s skeleton feature and adaptively combine those trained multiple classifiers to decide if the target is a flying bird.The separately trained classifiers are capable of handling large intra-class pose variations and improving the classification accuracy.Furthermore,confidence interval is also set to finalize result.

2.2.Flying-bird skeleton feature representation

The overview of the proposed FBD method is shown in Fig.1.It operates in two phases:training and detection.The training phase individually trains the pose-specific classifiers and trains the adaptive combination of multiple classifiers.The detection phase performs on generating the moving candidates,classifying the candidates as flying bird or not on each image,and finally validating the detection result by using the key-pose sets and the classification results from two images.In particular,our FBD method design is detailed from the following key issues:the skeleton-based flying bird representation, flying bird key-pose sets extraction,multiple pose-specific classifiers design and their adaptive combination.

The first key issue regarding classifying flying bird is to determine a suitable representation for flying bird.In this section,we present a skeleton-based flying bird representation.The similar work has already been introduced in the field of human action recognition,28,33,34but the skeletal structures of flying bird and the way they change tend to be more diversified due to their own motion patterns unlike human.Considering these factors,we use the flying bird skeleton as follows.

2.2.1.Definition of a flying bird skeleton

Firstly,we obtain the interesting flying bird areas by image segmentation.In view of the fact that bird is a typical moving target with obvious movement characteristics,it is an efficient and reasonable plan to extract the regions of interest based on motion-detection.We mainly adopt the Visual Background Extractor(VIBE)method35which is a kind of background subtraction algorithm for video sequences.We obtain the code from home pages of authors and take the parameter values in Ref.36.

Fig.1 Overview of proposed FBD method.

The next step is to roughly capture the basic skeletal structure of a flying bird.We divide the bird body into four key parts:the head,tail and two wings.To construct the flying bird skeleton,we connect those four key parts to the bird body center point O as shown in Fig.2.

2.2.2.Feature vector extraction

How to localize the four key parts of a flying bird automatically and accurately is a significant question.There are many standard techniques for skeletonization such as thinning and distance transformation.37,38However,they are computationally expensive.Herein,in our work,a simple and efficient technique for roughly capturing those key parts is introduced.Since the orientation of the body axis of a flying bird is always close to the tangent line of its flying trajectory,8we find the bird body axis according to its motion direction across two consecutive frames.By searching for the maximum distance from any two contour points,the head and the tail position can be captured firstly.Then by analyzing the position histogram distribution of these contour points relative to the body center point,the two wings are usually local maximum of the distance.27After capturing these four key contour points,each of which represents a key part,we connect them to the bird body center point to construct a skeleton.The details of extracting a flying bird skeleton can be found in our previous work.27We roughly localize five points–central(O),head ph,tail ptand two wings(pw1and pw2)to represent flying bird.

Considering that the polar representation is invariant to scaling and in-plane rotation,this paper utilizes the polar distribution of those four representative contour points with respect to body axis as the feature vector of flying bird skeleton.

We thus define the flying bird skeleton feature as

where ph= (γh,θh), pt= (γt,θt), pw1= (γw1,θw1) and pw2= (γw2,θw2)are the rough positions of the head,tail and two wings,respectively to the bird body center point in a γ-θ polar coordinate system.To achieve its scale and translation invariance,the radius is normalized,and the orientation is assigned to be relative to the positive body axis from the tail to the head.

Fig.2 Building process of flying bird skeleton.

To simplify the feature computation,we primarily focus on 2D representation.Even so,there are still some unsettled problems that need to be solved.Namely,the change of bird flying direction affects the extraction of those key parts in 2D repre-sentation.For example,head-tail occlusion may appear when flying in front/rear and wing-wing occlusion when flying in side,resulting in the lack of part of the information.Therefore,we design several rules to adaptively handle these specific cases in the process of feature extraction.If the bird is flying in front/rear,the object centroid position can be set as the head,tail,and body center point position.On the other hand,if the bird is flying in side with only one wing visible,we set two wings in the same position.And we mainly adopt the flight direction of birds to help us determine the body axis,and this part of research has been introduced in our previous work.27

The proposed flying bird skeleton representation is advantageous in two aspects.Firstly,it characterizes the underlying topological distribution properties of the shape of flying bird,and thus it can tolerate certain scale of shape changes.Secondly,the low dimensional geometric structure brings down the computation overhead.

2.3.Flying bird key-pose sets extraction

As mentioned before,although there is great diversity in bird motion,birds have several key-poses.We can arrange these poses in some order to combine the representative movements like flapping,bounding flap-gliding,etc.Finally,we can get birds’motion by linking these different movements.

2.3.1.Analysis of bird flying poses

The bird flying pose changes dramatically during the flight,yet certain regularity still remains among those key body parts.For example,in general case,a bird tends to keep its headto-tail body structure unchanged,while the wing-parts usually keep posing periodical flapping,gliding,or bounding motion relative to the bird body axis.As a result,the variations in bird flying pose can be considered as the changes of the wing configuration relative to the bird body axis.In order to make this description more clear,we apply the flying bird skeleton for better explanation.

The two wing parts can be represented by two points,and the head-to-tail-point line represents the bird body axis,so the changes of bird flying pose can be described as the spatial position changes of the two points relative to the straight line.

As we scrutinize the collected flying pose data,we summarize that although great diversity of bird flying poses exists,the representative pose types can be described as:Pose A-two wings are both in upper side of the body axis,Pose B-both in lower side of the body axis,Pose C-fully outstretched to both sides and Pose D-wrapped tightly around the body,as shown in Fig.3.

Fig.3 Representative pose types of flying bird.

Table 1 Key-pose probability distribution(109 videos,12624 images).

After these four key-poses have been defined,we assemble more than 100 video data and analyze birds’ flying motion characteristics frame by frame as shown in Table 1.Based on the statistics of a large sample of data,more than 12,000 pictures,we obtain the probability of every four key poses.This result will be helpful to study the laws of birds’ flying motion.

2.3.2.Building flying bird key-pose sets

The change of bird flying pose during the flight is closely related to the certain types of bird movements.Based on the avian kinematics knowledge,29Fig.4(a)and(b)show some typical bird flights and their corresponding pose changes.In general,birds are flying in flapping motion,mainly including the representative Pose A and B.For the small finch birds,the flying speed is fast and the frequency of the flapping wings is high.Moreover,due to their light body,the finch birds often have their wings wrap tightly around the body,i.e. flying in Pose D.For the large broad-wing birds which flap at slower frequency,they normally spread their wings and gliding,i.e.flying in Pose C.

The statistics based on the large collected flying pose data29indicate that the change of pose has certain regularity in these typical flights.Using the representative pose types,the typical movements can be described as shown in Fig.4(c).It is demonstrated that the underlying motion pattern can be depicted using those key-pose sets.Therefore,the key-pose sets can be used as a basic unit,for describing the bird motion features.Based on these analyses,this paper proposes a new method that aims to fast capture the flying bird motion characteristics.The proposed method builds key-pose sets across two consecutive frames,and uses these key-pose set collections to describe the motion pattern of flying bird with satisfactory accuracy.

Different from the existing methods that are based on temporal analysis on multiple frames,this paper only utilizes two consecutive image frames to capture the shape change of flying bird.It decreases the feature size and speeds up the detection speed.Moreover,utilizing the two consecutive frames also excels at detecting the moving objects of interest,which can positively guide the candidate extraction at the detection stage.

In Fig.4(c),we design three different pose-change sequences to represent these three flight modes,i.e. flapping flight,bounding flight and flap-gliding flight.Flight Motion 1(FM1)is the pose-change pair description of flapping flight mode.Meanwhile,we find that FM1 is a basic part in birds’flight modes,and FM1 can combine with Pose C or Pose D to form FM2 or FM3.Therefore,we hope to unite these three flight modes to classify the deformation of birds effectively.

2.4.Multiple pose-specific classifiers for flying bird

Fig.4 Bird flight and its key-pose change sets description.

As detailed in Section 2.3,the variations in bird flying pose are mainly caused by the spatial position changes of the wing configuration relative to the body axis.Inspired by this,we propose to construct a set of representative pose classes according to their spatial relations,and design multiple pose-specific classifiers for them.Then combing those separately trained posespecific classifiers can greatly improve the classification accuracy.In this section, we will describe how to build and adaptively combine those pose-specific classifiers for flying bird.

2.4.1.Training each pose-specific classifier

The first stage in training each pose-specific classifier for flying bird is to build its corresponding training samples.In our work,to obtain the training data,we manually collect a large number of flying bird images from two public datasets:CUB-200–201139and ImageNet.40Then,the training samples of flying bird are grouped into several pose clusters,according to the spatial positions of the two wings relative to the bird body axis,as detailed in Section 2.3.

For each pose cluster,a specific classifier is built independently using the samples in that cluster as positive training samples,where a much larger set of negative training samples(not flying bird)are collected from the ImageNet dataset.

It is worth noting that different visual angles could cause the flight pose changing.Although the rotational invariance of skeleton-based representation can overcome the influence to a certain extent,different orientation will bring the change of flight pose.Therefore,we divide pose-specific classifiers into four basic categories(Pose A-B-C-D)according to the spatial relationship with two wings and body axis,and then we further break down the above basic classes into subclasses based on different visual angles(side,front and rear view).

We use linear Support Vector Machines(SVM)implemented in a LI Brary for Support Vector Machines(LIBSVM)41to train these classifiers separately.The features are based on the proposed flying bird skeleton descriptors.

2.4.2.Adaptive combination of multi-classifier

Given a set of pose-specific classifiers (C1,C2,...,Ck),for the input test sample x,let yi(x)denote the output of the ith posespecific classifier,where i=1,2,...,k.

Considering that each classifier is quite specific to a certain pose type,we let them work in parallel.According to Table 1 that we provided in Section 2.3.1,different pose has different proportion throughout the whole flight.Thus the probabilities of different poses are regarded as the weight value and can be defined as

And we get the final output result.

2.4.3.Adaptive combination of pose-change pairs

In a real scene,there might be some bird-shape negative samples.If we only use spatial relevance representation,they will produce relatively high false alarm rate.Hence,we need to do some further processing on the data to determine whether or not the target is a bird by spatio-temporal correlation representation.From Fig.4,we have already known that birds’flight motion can be decomposed into different pose sequences like FM1,FM2 and FM3.We call these sequences posechange pairs.

Based on the pose-change pair probability distribution as shown in Table 2,we take into full account the spatial correlation features of flying birds’motion characteristics and the indeterminacy during correlated detection.In our paper,we pay more attention to the pose changing process.For those negative samples,their poses remain about the same for a long time.Combining this reason with our statistic data,we find that the probability of state transition between the same pose can be neglected.In the other words,the probabilities of A→A,B→B,C→C and D→D are not considered,as shownin Table 2.Through these factors,we obtain the probability that the target is a bird by introducing Bayesian networks to infer the probability of pose-change pairs.

Table 2 Pose-change pair probability distribution(109 videos,12624 images).

In the detailed implementation,we regard the pose classification result Pose= [Q1,Q2,...,Qn],n=1,2,...,N as the input of deformation classifiers.Our classification based on Bayesian networks is a kind of probability calculating problem in essence.We only need to calculate the probability P(Q1,Q2,...,Qn)to judge whether the target is bird using threshold.According to Bayesian network theory,the conditional probability can be written as

where Sidenotes the probability of Qi.In our model,the current pose relates only to its previous state which is similar to a Markov process.Thus,the conditional probability can be simplified to

We define the final judgment probability PJas

After we calculate the final judgment probability,we compare it with detection threshold Th(0 < Th< 1),and if PJ≥Th,we can regard the detection target as a bird.We obtain the value of Thby training.

2.5.On-line detection

To address the problem of FBD,the proposed method utilizes the relation between the flying bird appearance and motion features.The detailed on-line detection process is shown in Fig.5.We provide a brief introduction from the following three steps:

Step 1.Generate flying bird candidates.In order to avoid the exhaustively scanning window search across the entire image,we utilize the motion information of the flying bird to extract the moving candidates of interests.A VIBE method is adopted.One reason of selecting two frames is that the minimum number of frames required for motion detection is obviously two.Another reason is also useful for key-pose sets extraction.35

Fig.5 Flowchart of on-line detection process.

Step 2.Single frame classification using multiple posespecific classifiers.To classify the generated moving candidate as either a flying bird or not,we examine its skeleton features and use multiple pose-specific classifiers.

Step 3.Multi-key-frame approval process.To achieve greater robustness,the multi-frame key-pose set information and the consistency of the classification results from images are utilized to validate the final detection result by majority voting.

3.Experiments

In this section,we provide a brief overview of the flying bird dataset that we built and its implementation details.Then the method detection and comparison results are presented in detail.

3.1.Dataset and implementation details

Our FBD experiments are based on two public datasets:CUB-200-2011and ImageNet.CUB-200–2011Dataset39isan extended version of the Caltech-UCSD Birds 200 dataset and contains 11,788 images of 200 bird species.It contains uncropped images of birds in the wild,including birds that are flying,swimming,perching,etc.From this dataset,we manually collect the birds that are flying as the positive samples with a total of 742 images,some of which are shown in Fig.6(a).ImageNet40is currently a well-known image database organized according to the WordNet hierarchy,in which each node of the hierarchy is depicted by hundreds and thousands of images.Currently,from this dataset we have collected 819 flying bird images.

In experiments,due to the unbalanced sample numbers of those different pose clusters in our dataset,we only select top 6 pose clusters for training,including Side Pose A,Side Pose B,Side Pose C,Side Pose D,Front/Rear Pose A and Front/Rear Pose B.For each of these pose clusters,a posespecific classifier is built independently using the samples in that cluster as positive training samples.There are roughly 300 positive training examples in first four pose clusters,and each 100 positive training examples in the other two clusters.And around 1000 non- flying bird images are collected from ImageNet as negative samples.

We use linear SVMs implemented in LIBSVM to train these classifiers separately.The flying bird skeleton features are used for training.Since the CUB-200–2011 Dataset provides parts’annotation,we directly use those key parts’information(i.e.the bird head,tail and two wing positions)available in dataset as input to the feature computation.For the training samples from the ImageNet dataset,we manually label them to facilitate similarly accurate feature computation.Then the features for each part are concatenated into a final 8-D feature vector for describing a flying bird skeleton.

3.2.Performance evaluation criteria

In practice,to quantify the object detection performance,most are interested in knowing how many objects it detects,and how often the detections that it makes are false.So,for capturing this trade-off,we adopt the following evaluation criteria:Recall,False Positives Per Image(FPPI)27,42and Detection Speed(DS,s/image).

where TP denotes the number of true positives and FN is the number of false negatives in the image.FPPI represents the number of negatives who are mistaken for a bird.

3.3.Detection and comparison results

To demonstrate the effectiveness and efficiency of the proposed FBD method,we compared it with three latest FBD methods,which are based on the Background Subtraction and Point-Tracking(BSPT)by Shakeri and Zhang,10the Wing-Beat Periodicity(WBP)by Li and Song,11and the Hierarchical Incorporation of Shape and Shape Dynamics(HISSD),28respectively.Meanwhile,to further validate the applicability and superiority of applying our designed classification style to the FBD problem,we also compared the proposed method with two other existing classification-based bird detection methods,which are methods using Haar-Like features&AdaBoost classifiers(HLAB)by Anil,25and HOG CS-LBP features&SVM(HCLS)classifier by Mihreteab et al.26respectively.To facilitate the following concise descriptions,these five methods are shortened as BSPT,WBP,HISSD,HLAB and HCLS method.

Fig.6 Flying bird dataset used in this paper.

For BSPT,WBP and HISSD method,we directly adopt the default algorithm from their work10,11,28for testing.In order to equally compare our method with the classification-based methods HLAB and HCLS,we use the same flying bird training dataset as detailed in Section 3.1,except for the fact that the dataset preprocessing mentioned in method25is also conducted to extract region of interests of birds from images using the bounding box information available in the dataset.Then we use clipped images as the positive training samples,and also add images with blacked-out clipped regions to the negative training samples,as shown in Fig.6(b)and(c).The image size of each training sample is normalized to 64×64 pixels.During the method implementation,for HLAB method,we directly use the Violar&Jones algorithm available in OpenCV with the default parameters in Ref.43For HCLS method,we implement it in LIBSVM Matlab by combining a single linear SVM classifier with augmented HOG CS-LBP features.Here the HOG features are extracted with a 1984-D vector using VLFeat toolbox,44and the CS-LBP features are computed as a 1024-D vector.26

We have conducted three kinds of experiments to compare the detection performance among the proposed,BSPT,WBP,HISSD,HLAB and HCLS methods.

3.3.1.Detection of flying bird across different species

Aimed to validate the method effectiveness of handling different bird species,we collect 200 different bird species from dataset,39which vary significantly in size,shape,color and texture.Considering that the BSPT and WBP methods do not need to consider the bird appearance,herein no comparison with them is made.We select the image-based HLAB and HCLS methods for comparison.Fig.7 shows example detection results on a wild duck and a crow bird.It is demonstrated that false detections exist in both HLAB and HCLS methods,possibly due to their insufficient feature discriminant ability.

3.3.2.Detection of flying bird under different poses

To evaluate the method effectiveness of handling multiple variable poses,we collected 3 flying bird videos as shown in Fig.6(d),which contain 3 kinds of typical flights including flapping,bounding and flap-gliding.In these different motion patterns,a flying bird often undergoes much larger-scale pose changes.

Fig.7 Example of detection results on flying bird across different species.

Fig.8 Example detection results under different flying poses.

In the flapping video,the bird motion pattern is relatively simple,having linear trajectory and periodical wing- flapping.What is more,the change of pose is very clear,and just the pose up or down appears.Hence,all of these FBD methods have achieved satisfactory detection results.Fig.8(a1),(b1),(c1),(d1),(e1)and(f1)show example detection results on the flapping bird,which typically holds the wings up.It can be seen that all methods can detect this flapping bird.However,for the HLAB and HCLS methods,there exists a certain false positive,possibly because their local features are not discriminative enough,when dealing with the low-altitude backgrounds.

In the bounding video,the bird motion pattern is relatively complex.The wing- flapping and bounded flights are presented in the video sequence non-periodically.Moreover,the bird flying trajectory changes abruptly.Therefore,in dealing with this scene,the methods BSPT and WBP have been greatly constrained in performance.Moreover,during the bounding flight, flying bird usually makes very extreme variations in pose,which makes it very challenging to be detected.As we can see in Fig.8(a2),(b2),(c2),(d2),(e2)and(f2),the bounding bird,which is wrapped tightly around the body,is neglected by most of these methods.Fortunately,the HISSD0 and the proposed method are robust to these pose variations,since they build the pose change information using the prior knowledge of bird bounding flight.

In the flap-gliding video,the bird flies with the wing flapping motion in the first several frames,and then keeps gliding toward the trees.Since this bird has changed greatly in its flying direction and position during the flight,there hardly exists trajectory continuity or wing-motion periodicity.The BSPT and WBP methods cannot work them very well.Additionally,the bird sometimes flies in rear view.There exists some false negatives for the HISSD method,mainly because this method does not consider the birds flying in rear pose when building the templates.Fig.8(a3),(b3),(c3),(d3),(e3)and(f3)show example detection results,which clearly demonstrate the superiority of our method.However there still exist some missing detections in the last three image frames.This is because the bird is flying into the trees,totally occluded.How to deal with the problem of occlusion is to be addressed.

Fig.9 Example detection results under different flying poses.

We also set up another contrast experiment to evaluate the capability to deal with occlusion problem,as shown in Fig.9.In Fig.9(a1),(b1)and(c1),bird is partially covered by the shadow,and compared with the relevant algorithms(HLAB and HCLS),our algorithm can more effectively detect partially obscured object,and HLAB and HCLS have many missing detections.However,if the occlusion is caused by the relative positions between birds,we are trying to find a suitable method to solve this challenge.At this stage,we prepare to ignore several frames if birds are severely occluded.There are two reasons:our method is based on key-pose-change and we can detect bird in short time series,and thus we can skip several complex frame and restart the detection;on the other hand,different from traditional tracking problem,we only need to tell whether there is bird in the image without obtaining their motion path.

3.3.3.Detection of flying bird under different views

To evaluate the method effectiveness of handling different views,we have verified the feasibility and effectiveness of the algorithms on 3 flying bird videos.In the first video,one bird moves quickly with Bounding-Flight motion from high to low in the front view.In the second video,one bird is descending quickly in the rear view.In the last video,one bird hovers in a circle with Flapping-Flight,and the observation view changes from front to side.Fig.10 shows a part of detection result.We select the image-based HLAB and HCLS methods for comparison.It is demonstrated that there are false detections in both HLAB and HCLS methods,which are caused by the distractions of low-altitude airspace’s background.The proposed method has good robustness to solve this view-changing problem.However,there are still some detection omission phenomena.For example,in the last few frames in the second video,bird lands in the tree,and our detection performance decreases significantly due to the loss or occlusion of appearance information.

3.3.4.Robustness against non-bird object

In a low-altitude real scenario, flying bird detection is disturbed by large numbers of non-bird object.In this part of experiment,we aim to evaluate the false detecting rate of our algorithm.We choose some objects who may easily be confused with flying bird to test the proposed method.As shown in Fig.11,video sequences of a flying helicopter,a walking human on the road,and a running rabbit on the air field runway are our test samples.

We select the motion-based BSPT method and image-based HLAB and HCLS methods for comparison.Fig.11 shows example detection results,which clearly demonstrates the superiority of our method.BSPT,HLAB and HCLS methods have several weaknesses such as high false detection rate on non-bird object,and they are difficult to distinguish flying bird and non-bird object.

Fig.10 Detection result in different views.

Fig.11 Example detection results with non-bird videos.

3.3.5.Statistical analysis

Table 3 indicates the overall detection performance of these five FBD methods.It demonstrated that the proposed method achieves better detection performance in handling with the appearance variations across different species,different views and the changing poses.For the motion-based methods,such as BSPT and WBP,the main disadvantage is that since they rely on certain motion types that they have pre-assumed,it cannot robustly handle the varying poses of flying bird.This is because bird usually changes its pose along with its motion changing,as shown in our second experiment.Besides that,they are prone to other moving objects’disturbance.For the HLAB and HCLS methods,generally speaking,the great advantage is that they can achieve acceptable detection rate merely using single image,while they mostly generate a false detection.By contrast,for the HISSD method,by incorporating the shape and shape dynamics information,it has much lower false positives.However,as shown in the second experiment,it fails to detect the bird flying in rear pose,since there is no rear pose template in this method.Note that since HLAB method is implemented in OpenCV,its DS is quite differentwith other methods implemented in Matlab.Therefore,our method can provide more accurate result and requires less computing time than those methods which just use appearance or motion feature separately.

Table 3 Comparison of detection results among different methods under different conditions.

4.Conclusions and future work

This paper presents a method for detecting flying bird under a wider range of appearance variations for different species and poses.This method primarily focuses on utilizing the flying bird skeleton features combined with multiple pose-specific classifiers to perform the classification task on each image.Afterwards,to achieve greater robustness,the two-frame pose-change information and the consistency of the classification results from both images are further utilized to validate the final detection result.Experiments demonstrate the effectiveness and efficiency of the proposed method in handling varying species and poses.

There still exist some unaddressed problems.Due to the unconstrained motion,a free flying bird may fly from more possible directions,which further extends the pose diversity.Although our skeleton representation in a polar coordinate can obtain relative robustness to 2D rotation,it cannot well address all different direction changes,especially when part to-part occlusions occur at a certain viewing aspect,as the flying bird is actually a 3D object in real world.In the future,we plan to introduce the 3D skeleton representation,aiming at more robustly handling the wide pose variations.Moreover,flying birds often appear in low-altitude scenes,and as aforementioned in the second experiment,the occlusions by other targets such as trees or buildings are also to be considered.

In addition,these days,deep learning has become a dominating force in the visual detection field.For the detection of typical targets,deep learning based methods do have demonstrated superior performance.However,deep learning method needs a huge number of training samples to obtain a good result.The size of our training set is not large enough to meet the requirements of deep learning.For this reason,we have begun the effort to collect and label data on flying bird targets and found some methods to expand our data sets.However,in real detection video,bird targets could be extremely small,sometime even nearly invisible.The variation of features is not obvious.We try to use deep learning method to detect flying bird,but the result is not superior.It seems that deep learning has difficulty on detecting small birds.Of course,our research will continue to study deep learning,and we wish to find a more appropriate way to improve deep learning method and to solve flying bird detection problem in our following work.

Acknowledgements

This study was co-supported by the National Key Research and Development Program of China(No.2016YFB1200100)and National Natural Science Foundation of China(Nos.61521091,91538204 and 61425014).