APP下载

A Self-Adapting and Efficient Dandelion Algorithm and Its Application to Feature Selection for Credit Card Fraud Detection

2024-03-01HonghaoZhuMengChuZhouYuXieandAiiadAlbeshri

IEEE/CAA Journal of Automatica Sinica 2024年2期

Honghao Zhu ,,, MengChu Zhou ,,, Yu Xie , and Aiiad Albeshri

Abstract—A dandelion algorithm (DA) is a recently developed intelligent optimization algorithm for function optimization problems.Many of its parameters need to be set by experience in DA,which might not be appropriate for all optimization problems.A self-adapting and efficient dandelion algorithm is proposed in this work to lower the number of DA’s parameters and simplify DA’s structure.Only the normal sowing operator is retained; while the other operators are discarded.An adaptive seeding radius strategy is designed for the core dandelion.The results show that the proposed algorithm achieves better performance on the standard test functions with less time consumption than its competitive peers.In addition, the proposed algorithm is applied to feature selection for credit card fraud detection (CCFD), and the results indicate that it can obtain higher classification and detection performance than the-state-of-the-art methods.

I.INTRODUCTION

OPTIMIZATION problems exist in daily life, and can be solved by using mathematical algorithms.When their scales grow, though, they become incredibly challenging to solve.It is found that meta-heuristic algorithms are a class of efficient algorithms to solve such complex problems.In the past decades, numerous meta-heuristics algorithms were proposed, such as genetic algorithms (GA) [1], artificial bee colony algorithm (ABC) [2], particle swarm optimizer (PSO)[3], artificial fish swarm algorithm (AFSA) [4], firefly optimization algorithm (FA) [5], cuckoo optimization algorithm(COA) [6], bat algorithm (BA) [7], grey wolf optimization algorithm (GWO) [8], evolution strategy with covariance matrix adaptation (CMA-ES) [9], and fireworks algorithm(FWA) [10].Although they have achieved some success in some fields, the free lunch theory [11] indicates that an algorithm cannot perform optimally on all problems, and thus many try to propose new algorithms to face new challenges.

A dandelion algorithm (DA) is a recently proposed swarm intelligence algorithm as inspired by dandelion sowing [12].Due to its excellent performance in function optimization, DA has attracted the attention of many scholars, and they have proposed many variants of it to further improve its performance.Gonget al.[13] propose an improved dandelion algorithm called MDA, in which Levy mutation is replaced by the mean position of all seeds.They also give the proof of convergence and analysis of parameters.Zhuet al.[12] propose a dandelion algorithm with probability-based mutation, in which three probability models, namely linear (DAPML),binomial (DAPMB), and exponential (DAPME) models, are designed to select whether to utilize Levy or Gaussian mutation.Han and Zhu [14] propose a fusion dandelion algorithmbased adaptive selection strategy (FSS-DA) in which a distance-aware selection strategy (DSS) by jointly considering the function fitness value and the individual position distance,and other famous selection strategy are put into its fusion pool together.The best selection strategy decided by the overall reward is chosen to select the suitable dandelions for the next generation.To determine whether a seed is excellent or not, an improvement of the evolution process of DA with an extreme learning machine (ELMDA) is proposed [15], where the training model is first built based on the extreme learning machine(ELM) [16] with the training set, which is constructed by the excellent dandelions and poor dandelions assigned their corresponding labels (i.e., +1 if excellent or -1 if poor), and then the training model is applied to classify the seeds into excellent or poor ones.Excellent seeds are chosen to participate in the evolution process, while the poor ones are abandoned to save function evaluations.Hanet al.[17] propose competition-driven dandelion algorithms in which a novel competition mechanism is designed to find out the loser, and then three historical information models are designed by exploiting historical information with an estimation-of-distribution algorithm [18] to generate a new individual to replace the loser.In addition, they have been applied in many fields, such as traffic flow prediction [19], credit card fraud detection(CCFD) [20], and multi-skill resource-constrained project scheduling [21].Even if they perform better than their peers, it is found that new parameters are introduced by them.For all these methods, the structures are complex and have a large number of parameters to tune.

Filter, wrap, and embedded methods are commonly used in feature selection.The filter method is completely independent of any machine learning algorithm.It sorts data features according to certain criteria and selects the best feature subset.The wrapped method relies on a specific classification model to evaluate the selected feature subset.Because it directly optimizes the target, the wrapped method performs better.The embedded method imparts feature selection into the classification model and combines the advantages of filtering and wrapping methods [22].In this paper, a random search package method is used for feature selection, which mainly uses an intelligent optimization algorithm to iteratively add and delete features to establish feature subsets.

Sunet al.[23] proposed to use a genetic algorithm to select good feature subsets and support vector machine (SVM) as the base classifier to improve the detection rate.Xueet al.[24] proposed three new PSO initialization strategies and three new individual optimal and global optimal update mechanisms for feature selection to achieve the purpose of maximizing classification performance, minimizing the number of features, and reducing computation time.The base classifier used in this method is the K-nearest neighbor.Masoodet al.[25]proposed a new rank-based incremental search strategy method, Wrank-ELM, which adopted a feature selection method based on package sorting and used the ELM classifier to evaluate the selected features.Xueet al.[26] proposed an adaptive particle swarm optimization algorithm for feature selection, especially for large-scale feature selection.KNN is used as the base classifier, and the experimental results show that the design method is effective.Sagbanet al.[27] combined the bat algorithm and the ant colony algorithm to conduct a feature selection algorithm for the classification of cervical cancer data sets, but this method only takes accuracy as an evaluation index in terms of experimental results.Currently, existing intelligent optimization algorithms primarily focus on feature selection in the context of balanced data classification.Therefore, investigating how to design an improved intelligent algorithm for feature selection that enhances the classification performance on imbalanced data is a worthwhile research problem.

According to the analysis of the literature, the existing literature uses intelligent optimization algorithms to select features for balanced data, and there is a lack of intelligent optimization algorithms for feature selection under unbalanced classification.Among the existing intelligent optimization algorithms, the Dandelion algorithm, as a newly proposed algorithm, has superior performance.It needs to set many parameters set.Once they are fixed, it may perform well for some problems but no others.Therefore, it is worthwhile to reduce the number of parameters and simplify the Dandelion structure such that it can select more discriminative features and minimize feature redundancy and feature noise to improve the classification results of imbalanced data.In addition, many researchers have used intelligent algorithms to optimize the extreme learning machine to achieve excellent results, but they only deal with balanced data and do not consider the task of imbalanced data.The motivation of this paper stems from the aforementioned problems.Finding solutions address the above-mentioned issues holds significant importance for researchers and financial professionals in recognizing the role of feature selection in detecting credit card fraud given imbalanced classification scenarios.

In order to address these issues well, this work proposes a self-adapting and efficient dandelion algorithm, namely SEDA.Specifically, basic DA consists of normal sowing,mutation sowing, and selection strategy, but only normal sowing is retained to generate seeds in SEDA, i.e., mutation sowing is removed.Meanwhile, the population of SEDA has only one dandelion, i.e., core dandelion, which is different from the basic DA including core dandelion and assistant ones.In addition, an adaptive seeding radius strategy is designed to replace the two parameters (withering factor and growth factor).Finally, the greedy selection strategy is applied to choose the best one in the population into the next generation.Based on the above design, it is noted that all unnecessary mechanisms of DA are removed in SEDA.The proposed algorithm has the following advantages:

1) It is easy to implement.

2) It has only one parameter, i.e., the number of seeds.

3) It reduces time consumption.

The proposed algorithm is used to select proper features for credit card fraud detection since these benefits are advantageous to the solution of real-world problems.This work aims to make the following new contributions:

1) Proposing a simple and efficient dandelion algorithm in which all unnecessary mechanisms of DA are removed.

2) Developing an adaptive seeding radius strategy to reduce the number of DA’s parameters.

3) Applying the proposed algorithm successfully to feature selection for credit card fraud detection.

The paper is organized as follows.In Section II, we provide a brief introduction of basic DA.The proposed algorithm is given in details in Section III.Section IV presents the experimental results.The application of SEDA is presented in Section V.Finally, Section VI draws the conclusion.

II.DANDELION ALGORITHM

DA is divided into four parts: initialization, normal sowing,mutation sowing, and selection strategy [12], [28]:

1)Initialization: The first generation of population is generated randomly in the search range ofNdandelions.

Fig.1.Variation trend (a) γ = 1 and (b) γ ≠1.

2)Normal Sowing: DA solves a minimization problem where the number of seeds is calculated based on the fitness value, i.e., the smaller the fitness value, the more seeds are produced.Each dandelion produces dandelion seeds within a certain seeding radius that is dynamically adjusted.

3)Mutation Sowing: The best dandelion, i.e., the dandelion with the minimum fitness, jumps out of a local optimum by Levy mutation.

4)Selection Strategy: The best dandelion is always selected and retained for the next generation.ThenN-1 dandelions are selected from the rest according to a selection operator.

Normal sowing and selection strategies are repeated until the given or termination condition is satisfied.

In DA, the number of dandelion seeds is calculated as

whereSMis used to control the maximum value of produced seeds,Smis used to control the minimum number of produced seeds,fMandfmrespectively the maximum and minimum fitness values of all dandelions,ϵis machine epsilon,used to avoid the error of dividing by zero, and the fitness value of thei-th dandelion is expressed asf(xi).

Dandelions are divided into best dandelions and other dandelions according to their fitness value.Among them, the best dandelion is calculated as

whereLbandUbis repectively lower bound and upper bound of the search space,tmeans thet-th generation,αandβare respectively withering factor and growth factor, andγreflects the growth trend and is calculated as follows:

The sowing radius of other dandelions is calculated as:

whereδis a weight factor calculated as

whereTˆ is the maximum number of function evaluations, andTis the current number.

The best dandelion is sown with mutation operation, i.e.,

whereLevy( ) is a random number following the Levy distribution.

III.PROPOSED ALGORITHMS

A. A Self-Adapting and Efficient Dandelion Algorithm

In order to simplify DA’s structure, a self-adapting and efficient dandelion algorithm (SEDA) is presented in this paper.In it, an adaptive seeding radius strategy is designed to further reduce the number of parameters when the sowing radius is calculated.The sowing radius for the core dandelion can be calculated as follows.

As for DA, dandelions are divided into two types: optimal and other dandelions, which are calculated via (2) and (3),respectively.They show that the optimal dandelion algorithm has two parameters when performing a seeding process: wilting and growth factors.To simplify DA’s structure and reduce its parameter count, an adaptive dandelion algorithm keeps only one type of dandelion, i.e., the core dandelion.To simplify the structure of the algorithm and reduce parameter count, the adaptive dandelion algorithm keeps only one dandelion, i.e., the core dandelion, and removes the wilt and growth factors.The seeding radius of the core dandelion is calculated as.

From (7), it can be seen that the wilting and growth factors have been replaced by an adaptive radius strategy that can be updated adaptively by the number of function evaluations, and their trends are shown in Fig.1.

As can be seen from Fig.1, whenγ= 1, the seeding radius becomes smaller astincreases, with a downward trend; when γ ≠1, the seeding radius becomes larger astincreases, with an upward trend.They tend to meet the original intent of the underlying dandelion algorithm.

The proposed SEDA has one dandelion only in the initial population, and it can be considered as the best dandelion,which means that only the best dandelion generates the seeds and needs the calculation of its sowing radius.It is evident that SEDA is simpler than the basic DA and its variants.In SEDA, the sowing radius of the best dandelion is first calculated by (7), and thenNiseeds are generated with the sowing radius.If the new seeds are out of bounds, they are replaced by the random ones generated in the search space.Finally, the best one is selected from the population including the best one and all seeds by a greedy selection strategy.If the termination condition is not satisfied, the above steps are repeated.

In SEDA, all additional mechanisms (Gaussian variation,Levy variation, and seeding of common dandelions) have been removed as shown in Algorithm 1.

Algorithm 1 Self-Adapting and Efficient Dandelion Algorithm Input: and // is the current best dandelion, is the number of seeds.Xb Ni Xb Ni Output: // Exporting the best dandelions.1: repeat Xi 2: for i = 1 to do Rb Rb Ni 3: Calculate by (7) // Calculate using (7)4: for k = 1 to d do Xki =Xkb+rand(0,Rb)5:Xki 6: if out of range then 7: Generate a random location in the search space 8: end if 9: end for 10: end for 11: Select the best dandelion in the current population 12: until Satisfaction of termination conditions

Fig.2.Flowchart of SEDA.

To express the proposed algorithm more clearly, the flowchart of SEDA is given in Fig.2.The computational complexity of Algorithm 1 isO(Tˆ×M) whereMis the total number of seeds, andTˆ is the maximum number of generations.

B. An Imbalance Classification Method Based on SEDA Feature Selection

Credit card fraud is an important issue and is quite costly for banks and card-related companies.Billions of dollars are lost each year due to credit card fraud.Due to confidentiality issues, there is a lack of research on analyzing real-world credit card data.Therefore, it is extremely important to detect whether a credit card is fraudulent.Credit card fraud detection is a classic imbalanced classification task [29], [30].Feature selection is considered as an effective way to handle it[31].In this paper, the proposed SEDA is applied to feature selection for solving it.

In SEDA, the dimension of each dandelion is set to the number of features in a dataset.If the value of a dimension is greater than 0.5, it means that this dimension (feature) is selected.Otherwise, it is removed.For example, one dandelion (0.4, 0.1, 0.6, 0.7, 0.2) is to select the third and fourth feature from the dataset as the new dataset.In addition, the aims are to obtain a better classification performance and fewer features, and thus the objective function is designed as

whereλandμare weight factors to trade-off the classification performance and number of features.They are set to 0.99 and 0.01 in this paper since the classification performance is more important than the number of features.Gis the performance indicator for imbalanced classification (G-mean) [20].NFis the number of features.The smaller the value off, the better the classification performance.Moreover, some of the-stateof-the-art methods that are often used as comparisons, DA,MDA, DAPML, DAPMB, DAPME, BA, and IPSO are selected to compare with SEDA.The maximum number of evaluations is set to 1000.It is noted the weighted extreme learning machine (WELM) [32] is selected as the base classifier and the reasons are: 1) it has fast learning speed; and 2) it is suitable for imbalanced classification problems.

G-mean and AUC (area under curve) are selected as the evaluation indicators.In the following, they are first conducted on 10 imbalanced classification datasets, and then three credit card fraud detection datasets.

In summary, the imbalance classification method based on the adaptive efficient dandelion algorithm feature selection(SEDA-W) is shown in Algorithm 2.

Algorithm 2 Imbalanced Classification Algorithm Based on SEDA Feature Selection Input: Training dataset, maximum number of evaluations.Output: G-mean.1: Initializing the parameters of SEDA and WELM 2: Randomly initialize the population, calculate the fitness value of the population and evaluate the population according to (8)3: repeat 4: Use (7) to generate seeds and evaluate the seeds 5: Selection of individuals from existing populations for the next generation through a selection strategy 6: until meet the stopping conditions 7: Find the best dandelion and its location 8: Get G-mean based on the best dandelion and its location 9: return

In Algorithm 2, Lines 1 and 2 are the initialization phase that requiresNfunction evaluations.Lines 4-6 requireMfunction evaluations, whereMis the total number of seeds.The computational complexity of WELM classification isO(NˆmL+Nˆ2L), whereNˆ andmdenote the number of training instances and the dimension of each training instance,respectively [33].Ldenotes the number of hidden nodes in WELM.If the maximum number of generations are set toTˆ,the computational complexity of Algorithm 2 isO(N(NˆmL+Nˆ2L)+TˆM(NˆmL+Nˆ2L)).

C. Convergence Analysis of SEDA-W

The convergence of SEDA-W is mathematically analyzed.We assume thatXis a dandelion in SEDA-W’s population.Since it has only one dandelion in the initial population, it can be considered as the best one.

Theorem 1:The evolution direction of SEDA-W is monotonically nonincreasing, i.e.,f(X(t+1))≤f(X(t)), ∀t≥1.

Proof: Through the selection strategy, the best dandelion is always retained, and its fitness value is monotonically nonincreasing.Thus,f(X(t+1))≤f(X(t)) is established.■

Theorem 2: {X(t),t∈Z+} of SEDA-W forms a homogeneous Markov chain.

Proof:X(t) changes toX(t+1), denoted as

From (9), we can see thatX(t+1) is only related to the present stateX(t).ThusX(t) of SEDA-W,t>0, forms a Markovchain.Also, we can observe thatTdepends onX(t),butnott[15].

Theorem 3: {X(t),t∈Z+} of SEDA-W converges to the global/local optimal solution setW∗with probability 1, i.e.,

Proof: Based on Theorems 1 and 2, {X(t)} is monotonically non-increasing homogeneous Markov chain, and the transition probability from {X(t-1)} to {X(t)} is [15]

According to the total probability formula, we have

Let δ(t)=P((X(t)∈W∗)),u(t)=P((X(t)∈W∗|X(t-1)∉W∗)),(12) can be changed to

From (13), we have

Taking the limit on both sides of (14), we obtain

IV.EXPERIMENTAL RESULTS AND THEIR ANALYSIS

A. Experimental Settings

1)Benchmark Functions: In order to evaluate the performance of the proposed algorithm, 28 benchmark functions from CEC2013 are used in our experiments.The CEC2013 test suite is listed in [34].

In this paper, each algorithm is run 51 times, and the dimension is set to 30, the function evaluations are set to 300 000.Note that no additional parameters need to be set in SEDA except the number of the seeds.

2)Impact of the Number of Seeds: On SEDA performance.The number of the seeds is set to 100, 200, 300, 400 and 500,respectively.The mean error and Friedman ranking are shown in Table I.The best mean error on each function is marked in bold.From Table I, it can be seen that SEDA performs the best among them when the number of the seeds is set to 300 in terms of Friedman ranking (2.5000).Thus, it is set to 300 inthe following experiment.

TABLE I MEAN ERROR AND FRIEDMAN RANKING OF SEDA FOR DIFFERENT NUMBERS OF SEEDS

B. Comparison of SEDA With Other Methods

1)Comparison of SEDA With DA and Its Variants: In this section, the proposed SEDA is compared with the basic DA and its variants including MDA, DAPML, DAPMB, and DAPME.Their parameter settings follow their corresponding references [12].The mean error and Friedman rankings are shown in Table II.The best mean error and Friedman ranking are marked in bold.

From Table II, it can be seen that the six algorithms have the same performance onf1andf8, while SEDA has a better mean error than the other five algorithms on the other functions.Judging from the Friedman ranking, it can be concluded that the proposed SEDA is the best among them.The convergence curves off8,f9,f13, andf14are shown in Fig.3 while the other 24 convergence curves are in Supplementary File.From Fig.3, it can be seen that our method can jump out of the local optimum in the process of running.

In addition, the Wilcoxon rank-sum test is conducted to verify the performance of all compared algorithms, in which “+”indicates that SEDA performs significantly better than its peers, “-” indicates SEDA performs worse than its peers, and“ = ” indicates SEDA and its peer have no insignificant difference.The results as presented at the bottom of Table II show that SEDA is significantly better than DA and its variants.

Finally, the average time consumption of six algorithms on 28 test functions is shown in Fig.4.From it, it can be concluded that the proposed SEDA can achieve better performance with less time consumption.

Fig.3 shows their shatis of DA, DAPML, DAPMB, DAPME,IPSO, MDA, and SEDA on some benchmark functions.

2)Comparison of Three Models With Other Swarm Intelligence Algorithms: In order to further verify performance of the proposed SEDA, a comparison with ABC, IPSO, DE and CMA-ES is conducted on CEC2013, and the results of the four algorithms are taken from their references [35]-[38].The comparison results of among ABC, DE, CMA-ES, IPSO, andSEDA are presented in Table III.The best result is marked in bold.

TABLE II MEAN ERROR OF THE TEST FUNCTION AND FRIEDMAN RANKING

As shown in Table III, CMA-ES performs extremely well on unimodal functions but suffers from premature convergence on some complex functions.ABC performs best on 10 functions.Judging from the Friedman ranking, it can be seen that the SEDA is the best among all the compared algorithms,which indicates that SEDA is more stable on the test function.

C. Comparison on 13 Imbalanced Datasets

In order to verify the performance of SEDA, a comparison among DA, MDA, DAPML, DAPMB, DAPME, BA, and IPSO with WELM are conducted on 13 imbalanced classification datasets in this section, as shown in Table IV.The datasets can be found in UCI and KEEL database, where column“Abbr.” lists the assigned code of datasets, “#N” is the number of samples, “#mi/#Ma” represents the number of minority and majority samples, “#A” is the number of features in a sample, and “#R” is the ratio of the number of minority samples to the number of majority samples in a dataset.

In this simulation, the regularization coefficient and bandwidth of the radial basis function kernel in WELM are set to 1 and 100, respectively.Each algorithm runs 10 times.The results ofG-mean [20] and AUC [39] are shown in Tables V and VI.

In addition, AUC andG-mean are evaluation indexes to measure the quality of dichotomies.AUC is defined as the area under the ROC (receiver operating characteristic) curve.F1represents the harmonic average ofRecallandPrecision.G-mean is the geometric average of the prediction accuracy of positive and negative samples.

Fig.3.Convergence curves of DA, DAPML, DAPMB, DAPME, MDA, and SEDA on some benchmark functions.

Fig.4.Average time cost of 6 algorithms on 28 tested functions.

whereTP,TN,FP, andFNare respectively the number of true positive cases, true negative cases, false positive cases, and false negative cases.

From Table V, it can be seen that SEDA performs the best among them on S1, and S4-S13, and it is a bit worse than IPSO on S2and S3.DAPMLand SEDA have the same performance on S13.The other algorithms perform poorly in the remaining datasets.In addition, the results of the T-test, as shown at the bottom of Table V, indicate that SEDA is significantly better than its peers in terms of G-mean values.From Table VI, it can be observed that SEDA is the best on 12 datasets besides S2in terms of AUC.

Based on the results ofG-mean and AUC, we can conclude that SEDA can pick out a better feature subset from all features to get better classification performance than its peers.

D. Comparison With Three Datasets With Large Imbalance Ratio

In order to investigate the performance of the proposed SEDA on datasets with large imbalance ratios, we have selected three datasets which are from [40], as shown in Table VII.Dataset S16contains a total of 284 807 samples, with only 492 samples belonging to the minority class, accounting for only 0.172% of the total.The results ofG-mean and AUC are presented in Tables VIII and IX, and the best results on each dataset are marked in bold.As can be seen from Tables VIII and IX, on all three lager imbalance ratio datasets, our proposed method outperforms the other methods on G-mean, and the AUC value of S15is slightly lower than that of MDA-W.

E. Comparison via Three CCFD Datasets

Three real-world transaction datasets are selected for thisexperiment to validate the performance of SEDA, they are LoanPrediction1https://github.com/Paliking/ML_examples/blob/master/LoanPrediction/train_u6lujuX_CVtuZ9i.csv, Creditcardcsvpresent2https://github.com/gksj7/creditcardcsvpresentand Default of Credit Card Clients3http://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.For convenience, they are abbreviated as D1-D3, respectively, which list in Table X.

TABLE III MEAN ERROR AND FRIEDMAN RANKING OF THE FIVE ALGORITHMS

TABLE V G-MEAN

TABLE VI AUC

To reflect the classification performance comprehensively,in addition toG-mean and AUC, three evaluation metrics,Precision,Recall, andF1, are added to measure the performance of each algorithm, and the experimental results are shown in Tables XI-XIII.

The experimental results are analysed as follows.

1) IPSO has the bestPrecisionon the dataset D1; SEDA has better results than other algorithms onRecall,F1, AUC andGmean.On the key metricG-mean, SEDA beats by 4.50% and 9.06% the second best method IPSO-W and the worst one BA-W, respectively.

2) On dataset D2, SEDA has the best results on all five evaluation metrics.On the key metricG-mean, SEDA outforms by 0.85% and 12.08% the second best method DA-W and the worst one DAPME-W, respectively.

3) IPSO has the bestPrecisionandF1on dataset D3; SEDA has the best results onRecall, AUC andG-mean.On the key metricG-mean, SEDA performs better 1.44% and 9.41% than the second best method IPSO-W and worst one DAPME-W,respectively.

From the analysis of the above experimental results, it can be seen that the method proposed in this paper outperforms the compared methods in most of the indicators, especially in the key indicatorG-mean, which is better than all other methods.

TABLE VII THE DATASETS OF LARGE IMBALANCE RATIO

TABLE VIII G-MEAN OF THE LARGE IMBALANCE RATIO DATASETS

TABLE IX AUC OF THE LARGE IMBALANCE RATIO DATASETS

TABLE X PUBLIC DATA

TABLE XI COMPARISON OF RESULTS FOR D1

Finally, we examine the features selected by each algorithm with the largestG-mean values on the datasets D1-D3, as shown in Figs.5-7.The red parts are selected and white one are unselected.As can be seen in Fig.5, in D1, the feature in column 10 is selected by all the algorithms in the experiment,and therefore this feature contributes more to the differentiation of categories than the rest, while the features in columns 1 and 7 contributed less to the differentiation of categories.As can be seen in Fig.6, in D2, the features in column 5 contribute more to distinguishing categories than the rest, while the features in columns 1, 4, 8, 11, 16, 17 and 19 contribute less to distinguishing categories.As can be seen in Fig.7, in D3, the feature in column 5 contributes more to distinguishing categories than the rest, while the feature in column 3 contributes less to distinguishing categories.

V.CONCLUSION

This paper has presented a simple and efficient dandelionalgorithm (SEDA) by removing all unnecessary mechanisms of basic DA, an adaptive seeding radius strategy is designed to further reduce the number of DA’s parameters.Experimental results show that our proposed algorithm outperforms the basic DA and its variants, as well as other swarm intelligence algorithms on CEC2013 benchmark functions.In addition, it is found that the proposed algorithm requires less time consumption than its compared peers.Note that SEDA has been combined with WELM for the classification problem of imbalanced dataset.The experiments are conducted on 13 public datasets, and 3 datasets are large ratio datasets.Experimental results show the superiority of our proposed method.Finally, it is applied to feature selection for credit card fraud detection, and the results show its effectiveness.All experimental data in this paper can be obtained from https://github.com/bbxyzhh/Experimental-data.

TABLE XII COMPARISON OF RESULTS FOR D2

TABLE XIII COMPARISON OF RESULTS FOR D3

Fig.5.D1 selected features.

Fig.6.D2 selected features.

Fig.7.D3 selected features.

Our future work aims to analyze the other optimization technologies [41]-[49] and combine them with SEDA to further improve its performance.More real-world datasets [42],[50]-[52] should be used to test the proposed algorithm and its peers.In addition, in practice, the credit card data may be incomplete due to physical constraints.Hence, how to deal with incomplete data efficiently should be the focus of our future work [53]-[55].