Cost-Sensitive Multi-Granularity Sequential Three-Way Decision for Object Detection

2021-01-08SUNYongLIHuaxiong

山西大学学报（自然科学版） 2020年4期

SUN Yong,LI Huaxiong

(Department of Control and Systems Engineering, Nanjing University, Nanjing, Jiangsu, 210093, China)

Abstract：Many studies on object detection attempt to achieve a low misclassification error and they assume the misclassification costs are the same. Such assumption is unreasonable in many real-world applications due to the different costs and insufficient object information. Imbalanced misclassification costs and insufficient information may lead to higher cost. To solve the issue, we propose a cost-sensitive multi-granularity sequential three-way decision method for Object Detection. The proposed method is based on sequential three-way decision (3WD) considering multi-granularity features. It develops a decision strategy which can minimize the total cost in the detection process. In each step, it optimizes the misclassification cost and makes delayed decision if the object information is insufficient. In the method, the object information converts from rough granularity to precise granularity in object detection and it may reach more reasonable decision. The experiments on several object detection databases are conducted to validate the effectiveness of the proposed method.

Key words：sequential three-way decision; granular computing; cost-sensitive learning; object detection

0 Introduction

Sequential three-way decision (3WD) model is an extension of two-way decision (2WD) model, since it provides a boundary region to make delayed decision for insufficient information. In the past decade, sequential 3WD has received growing attention in many researches. Yao first proposed a sequential 3WD mode[1]. Based on the model, a series of sequential 3WD models were developed in many research field. These models can be divided into two main aspects: static 3WD models and sequential dynamic 3WD models[2]. The former mainly focuses on static condition. Liangetal. introduced triangular fuzzy decision-theoretic rough sets (TFDTRS) to obtain the precise value of loss function[3]. Lietal. proposed 3WD models based on subset-evaluation[4]. Liuetal. presented a 3WD model based on incomplete information system[5]. The latter are a series of sequential dynamic 3WD models. Yangetal. proposed a sequential three-way multiclass model[6]. Haoetal. presented an optimal scale selection three-way model for decision case which the objects are increasing[7]. Juetal. investigated a sequential three-way classifier with justifiable subspace[8]. Many current researches introduced multi-granularity method to address sequential three-way decision issues. Yao discussed a wide sense of 3WD and proposed a Trisecting-Acting Outcome (TAO) model[9]. Lietal. developed a sequential 3WD method for cost-sensitive face recognition with granular computing[10]. Qianetal. proposed a generalized model of sequential 3WD with multi-granularity[11]. These decision methods have been applied in many real-world scenarios, e.g. classification[12-14], clustering[15-17]and fuzzy sets[18-21].Image recognition and object detection are also decision issues in real-world scenarios. Object detection has received much attention for many years. Numerous object detection algorithms have been developed, including R-CNN[22], Fast R-CNN[23], Faster R-CNN[24], FPN[25], YOLO[26]and Retina Net[27]. The main purpose of these methods is to achieve a low recognition error and they are based on an assumption that all misclassification costs are the same, which is not the case in many real-world applications. In general, different types of misclassification may lead to different costs. The complexity and decision cost will be different with insufficient object detection. For example, in an office guard system with object detection technology, misrecognizing an imposter as a gallery and misrecognizing a gallery as an imposter are both mistakes. However, the former may lead to more serious risks than the latter. It is clear that we cannot ignore these differences. To address the issue, a series of methods called cost-sensitive learning were developed.

Cost-sensitive learning has been developed in recent years to address the imbalanced misclassification cost issue. It focuses not only on the accuracy of classification but also on achieving a minimum cost of misclassification. These studies can be divided into data and algorithmic levels. In the first level, these methods mainly deal with imbalanced data through optimizing data distribution. Baruaetal. proposed Majority Weighted Minority Oversampling TEchnique (MWMOTE) for imbalanced data issues[29]. An ordinal over-sampling method was presented to optimize the machine learning procedure[30]. Similarly, many adjusting data distribution methods were developed[31-33]. In the second level, the studies mainly adjust the network architecture to achieve a better performance. Zhang and Zhou proposed Multiclass cost-sensitivek-Nearest Neighbor (mckNN) and Multiclass Cost-sensitive Kernel Logistic Regression (mcKLR) for multiclass face recognition[34]. Khanetal. proposed a cost-sensitive (CoSen) deep neural network to learn robust feature automatically[28]. Many other related methods were proposed such as semi-supervised learning[35], Bayesian network[36]and decision tree[37]. In real-world, misclassification may lead to more costs if we cannot have sufficient object information. Therefore, it is necessary to introduce a boundary decision into these insufficient decision information scenarios and many cost-sensitive 3WD models were developed[2,10,38-42].Most existing studies of sequential 3WD focus on minimizing the costs of misclassification and they cannot include more essential costs, such as decision time cost, decision granularity cost and decision probability cost. To address the issue, we propose a cost-sensitive multi-granularity sequential three-way decision method for object detection. The proposed method will seek an optimal decision strategy which can minimize the total cost in object detection.

The rest of this paper is organized as follows. In Section 1, we formulate the cost-sensitive object detection issue. In Section 2, we propose a cost-sensitive multi-granularity sequential three-way decision method for object detection. In Section 3, we report and analyze the experimental results. Finally, in Section 4, we conclude the paper.

1 Problem formulation

For a cost-sensitive object detection problem, we denote an objectxand its labely. In an object detection database, there areMclasses of galleries andNclasses of imposters. They are positive objects and negative objects, respectively, which are denoted byP1,…,PMandN1,…,NN.

In the cost-sensitive three-way decision problem, decision results includingD={DP,DN,DB}, which represent recognizing an object as gallery object for a positive decision, imposter object for a negative decision and boundary for a delayed decision, respectively. These costs are different and decision cases can be divided into four types[10]:

1) False Acceptance: Misrecognizing a negative object as a positive object.

2) False Rejection: Misrecognizing a positive object as a negative object.

3) False Identification: Misrecognizing a positive object as another positive object or misrecognizing a negative object as another negative object.

4) Delayed Decision: Classifying a positive object or a negative object as boundary.

We can denote costsλPN,λNP,λPP,λNN,λBP,λBNand describe them with a 2×3 cost matrix, whereλijdenotes classifying objects of thejclass as theiregion,i∈{P,B,N},j∈{P,N}. These costs in Table 1 are different and need to be set for real-world condition. In general, false acceptance and false rejection may lead to the most cost and the false identification cost is less than delayed decision. Therefore, these can be set asλNP≥λBP≥λPPandλPN≥λBN≥λNN[10].In real-world scenarios, object detection is a sequential strategy with information increasing. The strategy will be more reasonable since the granularity from rough to precise[9-10]. In this paper, the granularity is the size of the detection region. With more decision steps, the granularity is from rough to precise and the size of region will be smaller. Thus, we will obtain more granularity features and object details. The granularity feature can be presented in the following definition.

Table 1 Cost matrix of three-way decision

Definition1[10]Denote object set byS={A1,A2,…,AS}. LetF={f1,f2,…,fl,…,fL} be a mapping set, it can represent thel-th step which extracts features from object set, whereFis granularity function. For ∀A∈S, the granularity feature set can be denoted asB={B1,B2,…,Bl,…,BL}, andB={f1(A),f2(A),…,fl(A),…,fL(A)}, where therepresent a totally feature ordered relation,i.e.B1B2…Bl…BL.

The main purpose of the cost-sensitive multi-granularity sequential three-way decision object detection method is to seek a reasonable decision fromD={DP,DN,DB} for each decision steps. Thus, we can minimize the total cost in the sequential decision steps.

2 Proposed methods

2.1 Sequential three-way decision

Yao proposed a sequential 3WD method and granular computing perspective[1,9]. It is an effective method for sequential decision which information is insufficient. In real-world, it is a sequential decision process when object granularity feature is increasing. Therefore, we develop a general sequential 3WD model for object detection. Simultaneously, we introduce a delayed decision when the object granularity feature is insufficient. We can find a minimum of total cost in a sequential three-way decision strategy[2,10]. Thus, we can have the following definition for sequential three-way decision process.

Definition2[10]Denote the cost of decidingBlasdby cost(d|Bl), whered∈D={DP,DN,DB}. The sequential three-way decision is a series of decisions:

SD=(SD1,SD2,…,SDl,…,SDL)=

(φ*(B1),φ*(B2),…,φ*(Bl),…,φ*(BL))

(1)

In the sequential decision process,φ*(Bl) is the decision which cost is minimum of thel-th step, we have:

(2)

For the cost-sensitive three-way decision, we can minimize the decision cost with Bayesian decision[9-10], the decision costs ofd∈D={DP,DN,DB} can be denoted as follows.

cost(DP|Bl)=λPPPr(P|Bl)+λPNPr(N|Bl),

cost(DN|Bl)=λNNPr(N|Bl)+λNPPr(P|Bl),

cost(DB|Bl)=λBPPr(P|Bl)+

λBNPr(N|Bl).

(3)

Considering each decision results of thel-th step, we can find the minimum of costs and optimal decision in cost(DP|Bl), cost(DN|Bl),cost(DB|Bl), the optimal result is as follows.

(4)

With an special condition for most scena-rios[10], which is (λPN-λBN)(λNP-λBP)>(λBP-λPP)(λBN-λNN), and we can select the minimum of cost of each decision steps.

(5)

For simplify, we can denote decision results as follows.

(6)

From references[9-10], these thresholdsα,β,γcan be represented as follows.

(7)

The optimal decisionSDlof sequential decision process is determined by predict probability Pr(P|Bl), which can be computed from object detection networks. The granularity featureBlis an object information description which can be used to compute the predict probability Pr(P|Bl) in networks. We will analyze the process of extracting object granularity feature and computing predict probability.

2.2 Object detection method

In the past decade, a series of object detection methods achieve effective performance in real-world applications[22-27]. In this paper, we select the Faster R-CNN to conduct experiments, since it has great perform in popular object detection database[24], such as Pascal Voc 2007[43]and COCO 2014[44]. It is mainly composed of two modules, Deep Convolutional Neural Network (DCNN) and detector with Region Proposal Network (RPN). When an image is inputted in the Faster R-CNN, the first module can compute image convolutional feature map and the second module can find detection region from the map with attention mechanisms. Then, it can find all objects in an image and obtain the predict probability of each object with classifier[24]. The architecture of Faster R-CNN is shown in Fig. 1, it is adjusted and simplified from reference[24].

Fig.1 Architecture of Faster R-CNN

In the object detection method, the granularity is the detection region size, which will be smaller with the decision steps increasing.

Definition3[10]Denote granularity functionFbyFodin object detection. The object set is

S={A1,A2,…,AS}.

Thus, the object detection mapping set is

For ∀A∈S, the granularity feature set is

and

With the classifier, we will obtain the predict probability, which are

Pr(P|Bl)={Pr(P1|Bl),…,

Pr(Pm|Bl),…,Pr(PM|Bl)},

Pr(N|Bl)={Pr(N1|Bl),…,

Pr(Nn|Bl),…,Pr(NN|Bl)}

in thel-th step, whereM,NdenoteMclasses positive objects andNclasses negative objects. In our method, {P1,…,PM,N1,…,NN} is a partition of the universe of discourse and the process of detection cannot include other classes. In each step, we have:

(8)

Through computing predict probability Pr(P|Bl), we can make decision fromD={DP,DN,DB} in each step. Considering the misclassfication cost and decision cost, we can define the total cost and find the minimum of cost in sequential decisions.

2.3 Cost-sensitive multi-granularity sequential three-way decision

Many studies of sequential three-way decision attempt to achieve the misclassification costs. However, in real-world application, we cannot ignore other existing costs. For example, the total cost including misclassification cost and decision cost, the latter can increase with more running time. Thus, it is essential to consider all costs in sequential decision strategy. To address this issue, we propose a cost-sensitive multi-granularity sequential three-way decision method for object detection. We attempt to find the optimal decision step in sequential decision strategy which has the minimum of costs.

In real-world scenarios, the total cost can be divided into misclassification cost and decision cost.[45]To describe these costs, we have the following definition. Denote misclassification cost and decision cost by cost(d|Bl) and cost(l,k,Pr(R|Bl)), the total cost is cost*(d|Bl), which is as follows. Thelandkrepresent thel-th decision step and the granularity of this step, which isksize of detection region. Theε1andε2are cost balance parameters,ε1+ε2=1.Rrepresent a set including positive objectPand negative objectN,Hrepresent the sum of positive object classMand negative object classN, whereH=M+N. Pr(Rh|Bl) denotes the predict probability of recognizing object ash-th class inRset withBlgranularity feature.

cost*(d|Bl)=

ε1cost(d|Bl)+ε2cost(l,k,Pr(R|Bl)).

(9)

cost(l,k,Pr(R|Bl))=

(10)

(11)

If the decision probability cost be the maximum, we can make decision for recognition. The predict probability will be as follows, whereh*is the actual class of object,h*∈H, Pr(Rh|Bl)=1,h=h*, for other classes, Pr(Rh|Bl)=0,h≠h*.

(12)

The decision probability cost will be from minimum to maximum, since the probability for recognition is increasing. In real-world, the more decision probability can lead to more decision cost. The three parts of decision cost will increase with more object detection information and decision steps. Therefore, the total cost of decision results will be as follows.

(13)

With the granularity varying from rough to precise, the misclassification cost will be less and the decision cost will be more. Thus, the total cost is the sum of misclassification cost and decision cost, which will decrease and then increase, we can find the minimum of total cost in sequential decision steps. The cost-sensitive multi-granularity sequential three-way decision is shown in Table 2. In each steps, the optimal result is given by:

SDl=φ*(Bl)=

(14)

For the sequential decision steps, we attempt to find thelmin-th step, which has the minimum cost of the optimal strategy, it is as follows.

(15)

The proposed method can provide an optimal decision for object detection, when the granularity from rough to precise. We will design experiments to analyze the effectiveness in object detection.

Table 2 Cost-sensitive multi-granularity

3 Experiments

In this section, experiments were conducted to examine the effectiveness of the proposed cost-sensitive multi-granularity sequential three-way decision method on two popular object detection databases Pascal Voc 2007[43]and COCO 2014[44]. We selected two detection networks VGG16[46]and RES101[47]in Faster R-CNN to evaluate the performance in sequential decision steps.

3.1 Object detection databases and networks

The Pascal Voc 2007 database contains about 10 000 images of 20 classes, including animals, plants and people, the labels provide their positions and classes[43]. The COCO 2014 database has more samples including over 300 000 images and 2.5 million objects, which contain 80 classes. The scenarios have more differences. The images of the two databases will be resized to 300×300 pixels for training and testing[44].

The detection network VGG16 provides 13 convolutional layers and 3 fully connected layers, which are 16 weight layers. It is stacked by 3×3 convolutional kernels and 2×2 maximum pooling layers. The network has great performance in object detection which won the second prize of classification challenge in 2014 ImageNet Large Scale Visual Recognition Challenge (ILSVRC)[46]. The detection network RES101 contains 101-layer residual networks, which deal with the degradation issue of deep network with shortcut connections. The network won the first prize of classification challenge in 2015 ILSVRC[47]

3.2 Experimental parameter settings

In cost-sensitive object detection, the experimental parameter settings are shown in Table 3, whereMandNare the classes of objects, they were randomly selected from each database. We randomly selectedNumPandNumNimages from each positive and negative classes. The misclassification costs are connected with the proportion of cost. Considering the real-world scenarios, we set the cost proportion in Table 3. The cost balance parameters areε1=0.5 andε2=0.5. In Faster R-CNN, it can generate 300 detection region. For simplicity, we selected 10 different granularities from rough to precise, whose sizes are {50, 30, 20, 15, 12, 10, 6, 3, 2, 1} and the decision steps are {1, 2, 3, 4, 5, 10, 15, 20, 25, 30}, respectively. The relationship of granularity and decision step is shown in Fig.3. The object information will be more sufficient with the granularity decreasing.

For the proposed method, these images from rough granularity to precise granularity are shown in Fig. 2, which can represent the process of object information increase. With more information and more decision steps, the images will be clearer. Thus, the misclassification cost will decrease and the decision cost will increase, which can be seen in the following experiments. All of the experiments are conducted on a computer with graphics card Quadro P5000 and 16 GB GPU memory, the method is programmed in Python (version 2.7).

Fig.2 Granularity from rough to precise on Pascal Voc 2007 and COCO 2014.

Table 3 Experimental setting

3.3 Experimental Analysis

In this subsection, we compared the perform of the proposed sequential three-way decision and sequential two-way decision, which include total costs and errors. For each round of experiments, we compare the misclassification cost (costmis), decision cost (costdec), total cost (costtotal), high-cost error (errPN), low-cost error (errNP), total error (err) and running time. For a fair comparison, we regarded the delayed decision which can classify object to boundary region as misclassification in sequential three-way decision. With the decision steps increasing, the granularity will vary from rough to precise, thus the object information will be more. The optimal decision step is connected with thresholdα,β,γand predict probability Pr(P|Bl), they can be computed from section 2. All of the experiments were repeated 10 times. The experimental results of cost, error and running time are shown in Figs. 4, 5, 6, 7, and Tables 4 and 5.

Fig.3 Decision step versus object granularity

1) Misclassification cost, decision cost and total cost

In Figs. 4, 5 and 6, we compared misclassification cost, decision cost and total cost between sequential three-way decision and sequential two-way decision. For misclassification cost given in Fig.5, results show it decrease with the decision steps increasing. Firstly, with the insufficient information, most of the objects will be classified to boundary region, which will cause more misclassification cost. When the decision steps are increas-ing, the granularity will change from rough to precise, thus the object information will be more, more objects will be classified to correct classes and the misclassification costs will be less, which leads to lower misclassification cost. Such results demonstrate that the sequential three-way decision has the better performance than sequential two-way decision, because we introduced delayed decision and classified some objects in boundary region which can lead to less cost. Thus, the boundary decision is an optimal strategy for object detection when the available information is insufficient. In Fig.4, the decision cost will be higher when the object information is increasing, which will acquire more running time and more decision process. Therefore, the decision cost including decision time cost, decision granularity cost and decision probability cost will increase. The decision costs for two-way and three-way decision are similar since these costs are mainly connected with deci-sion steps and granularity, while the decision method cannot cause more different costs. In Fig. 6, results indicate the total cost decreased firstly and then increased. The reason is that the granularity varies from rough to precise and the available information is more sufficient, the misclassification cost is increasing and the decision cost is decreasing. In the sequential three-way decision process, we can find the optimal decision step having the minimum of total cost, which is marked in this figure. Experiments show that sequential three-way decision has lower total cost than two-way decision and the minimum of the former is lower than the latter, which can demonstrate effectiveness of the proposed method.

Fig.4 Decision cost comparison between two-way andthree-way decisions on Pascal Voc 2007 and COCO 2014

2) Misclassification error rate

For misclassification error, we evaluated the perform between three-way decision and two-way decision, which are shown in Fig.7 and Table 4. The Table 4 shows the comparison of two decision methods in optimal decision step. We compared the high-cost error (errPN), low-cost error (errNP) and total error (err). By classifying some high-cost objects to boundary region and make delayed decision, the sequential three-way decision achieve lower high-cost error (errPN) than two-way decision, which can lead to lower misclassification cost. For low-cost error (errNP), the three-way decision has the similar performance with that in two-way decision. For total error (err), three-way decision is higher than two-way deci-sion, since the proposed method decides some objects, whose available information is insufficient in boundary re-gion and leads to higher total error. With the increasing decision steps and the more information, all of the error are decreasing, which is reasonable in real-world scenarios. This indicates the proposed method has the better performance in decreasing high-cost error, which is suitable for cost-sensitive object detection issue.

Table 4 Comparisons on misclassification cost (costmis), decision cost (costdec), total cost (costtotal),

Fig.5 Misclassification cost comparison between two-way and three-way decisions on Pascal Voc 2007 and COCO 2014

Fig.6 Total cost comparison between two-way and three-way decisions on Pascal Voc 2007 and COCO 2014

Fig.7 High-cost error comparison between two-way and three-way decisions on Pascal Voc 2007 and COCO 2014

3) Running time

In Table 5, we compared the running time and the results demonstrate that three-way decision acquire more running time than two-way decision. The reason is deciding some object which information is insufficient in boundary region may lead to more decision running time. In the experiments, the increase of running time is little, which is acceptable. Thus, it also validated the effectiveness of the proposed method in object detection.

In our experiments, the proposed method can minimize the total cost and we can find the optimal

Table 5 Running time (in seconds) comparison

decision step which has the minimum of total cost. The proposed method of sequential three-way decision achieved lower high-cost error than two-way decision and did not cause more running time. However, we did not design experiments on more object detection database and some parameters need to be analyzed further, we will address these issues in the future.

4 Conclusion

We proposed a cost-sensitive multi-granularity sequential three-way decision method for object detection to address the issue that some essential costs of the decision process are ignered. Experimental results demonstrate that an optimal decision step can minimize the total cost of the whole decision process. Three-way decision achieves lower total cost and high-cost error than two-way decision. Experiments on several popular object detection databases validate the effectiveness of the proposed method.

山西大学学报（自然科学版）

2020年4期