APP下载

Sparse Reconstructive Evidential Clustering for Multi-View Data

2024-03-01ChaoyuGongandYangYou

IEEE/CAA Journal of Automatica Sinica 2024年2期

Chaoyu Gong and Yang You

Abstract—Although many multi-view clustering (MVC) algorithms with acceptable performances have been presented, to the best of our knowledge, nearly all of them need to be fed with the correct number of clusters.In addition, these existing algorithms create only the hard and fuzzy partitions for multi-view objects,which are often located in highly-overlapping areas of multi-view feature space.The adoption of hard and fuzzy partition ignores the ambiguity and uncertainty in the assignment of objects, likely leading to performance degradation.To address these issues, we propose a novel sparse reconstructive multi-view evidential clustering algorithm (SRMVEC).Based on a sparse reconstructive procedure, SRMVEC learns a shared affinity matrix across views, and maps multi-view objects to a 2-dimensional humanreadable chart by calculating 2 newly defined mathematical metrics for each object.From this chart, users can detect the number of clusters and select several objects existing in the dataset as cluster centers.Then, SRMVEC derives a credal partition under the framework of evidence theory, improving the fault tolerance of clustering.Ablation studies show the benefits of adopting the sparse reconstructive procedure and evidence theory.Besides,SRMVEC delivers effectiveness on benchmark datasets by outperforming some state-of-the-art methods.

I.INTRODUCTION

MULTI-view clustering (MVC) aims to categorizenmulti-view objects into several clusters so that objects in the same cluster are more similar than those from different ones [1], [2].Objects are described by several views of data in an MVC problem, e.g., documents originally written in English as one view and their translations to French, German,Spanish and Italian as 4 other views, and MVC algorithms often provide improved performance compared to single-view clustering algorithms [3], [4].Therefore, MVC has been successfully applied to various applications including computer vision [5], [6], social multimedia [7] and so on.Based on different philosophies, many MVC methods have been proposed and can be classified into several categories [4], such as generative methods [8], [9], subspace-clustering-based methods[10], deep-learning-based methods (e.g., a novel deep sparse regularizer learning model that learns data-driven sparse regularizers is proposed in [11] to cluster multi-view data), spectral-clustering-based methods [12], [13], the graph-learningbased methods [14], [15] and so on.Broadly speaking, the above methods are concerned with solving the MVC problem in a way that improves either efficiency or clustering performance, assuming that the correct number of clusters is known.

Motivations: However,estimating the number of clusters for an MVC problemcould be substantially more challenging and urgent than grouping the objects [21].Many state-of-theart MVC algorithms suffer from performance degradation without setting the correct cluster number, which can be illustrated quantitatively through the following example.We run 5 MVC algorithms on the widely used YALE three-view dataset[22] consisting of 15 clusters.The results in terms of normalized mutual information (NMI) [23] are shown in Fig.1.As can be seen, all the counted algorithms have lower NMI values when they are not fed with the correct cluster number,compared to the results shown in the dotted circle.To date,there are several techniques based on specific criteria to estimate the number of clusters, such as KL statistic [24], elbow[25], and gap statistics [26].Nevertheless, these used criteria are served for single-view data.Extending these techniques to MVC problems is not feasible, although multiple views can be concatenated into a single high-dimensional vector.The first reason is that such concatenation results in very high-dimensional feature vectors.Without an appropriate metric learning process, the calculation of distance between these high-dimensional vectors is definitely affected by thecurse of dimensionality[27], significantly reducing the effectiveness of cluster number estimation.Another reason is that the estimation result may be biased toward a view (or views) that yields a dominantly large number of features.

Moreover, each cluster has one most representative object(known as the cluster center), as discussed in [28].These cluster centers provide a variety of useful information in addition to directly guiding the clustering process.For instance, the power supply strategy for clients within the same cluster is frequently designed based on the consumption habits of one typical customer (cluster center) in the electrical consumption clustering analysis [29].In this case, an overall solution to the issues affecting all objects in the same cluster may be found through the analysis of a cluster center [30].Hence, the second motivationto identify the cluster center in each cluster of multi-view datais prompted.Besides, the hard/fuzzy partition that conventional MVC algorithms generate is not fine enough, especially for multi-view objects in heavily overlapping regions.To enhance fault tolerance and clustering performance, it is also a motivation touse a more fine-grained partition to cluster multi-view data.

Fig.1.Average NMI values of five MVC algorithms with different numbers of clusters.The considered algorithms including AE2Net [16], MLRSSC[17], MvWECM [18], SDMVC [19] and DeepNMF [20] are run 5 times.

Technical Issues: As shown in [17], the common technical challenge in MVC is to learn the affinity matrix, which is usually learned through the self-expression procedure [31], i.e.,

under the constraints d iag(W(s))=0, ∀s∈{1,...,p}, wherepis the number of views, X(s)∈RDs×nis the data matrix ofsth view,nis the number of objects,Dsis the number of dimensions ofsth view,αandβare two regularization parameters,and W(s)∈Rn×nis the affinity matrix ofsth view to be learned.After learning the affinity matrix W(s)for all views,the affinity matrix W∗of the entire multi-view data is calculated as the element-wise average [17], [32].The similarity between intra-view dimensions can not be held consistent before and after self-expression, because these self-expression techniques only concentrate on maintaining comparable objects with similar self-expression outcomes and disregard relevant information between various dimensions.Besides,calculating W∗as the element-wise average of all W(s)is too arbitrary to allow all views to share a consistent affinity matrix.Thus, the first technological challenge ishow to create a new self-expression objective function that can preserve the relationships between intra-view dimensions and allow the inter-view data to share the same affinity matrix.The second technical challenge ishow to theoretically design a mapping that visualizes the properties of each multi-view object in order to discover the cluster centers after learning the affinity matrix.

The credal partition is a novel way of partition proposed by authors of [33], [34], where the memberships of objects in clusters are defined using mass functions [35] in the framework of evidence theory [36].Its formalization allows for a detailed description of the ambiguity and uncertainty in clustering membership, making the credal partition particularly appropriate for grouping those multi-view objects in overlapping regions.Nevertheless, as almost all of the current methods are created for single-view data, there is still a significant gap between credal partition methods and MVC [28].The third technical challenge is now brought up, i.e.,how to structure the learning of a credal partition that can be nested with the affinity matrix acquired from the multi-view self-expression.

Contribution: According to the above discussion, a novel MVC algorithm, named sparse reconstructive multi-view evidential clustering (SRMVEC), is proposed to simultaneously solve the 3 problems rarely addressed in other MVC research,namely, estimating the cluster number, identifying the cluster centers and deriving the fine-grained credal partition.The contributions of this paper are three-fold:

1) Differently from problem (1), we formulate a new objective function associated with the related information between intra-view dimensions, the view weights and the reconstruction error, enabling multiple views to learn a consistent affinity matrix directly;

2) Based on the learned affinity matrix, the multi-view objects are mapped to a 2-dimensional chart that can be read by humans using two mathematical metrics that we define for each multi-view object.Users can easily obtain the cluster numbers from this chart and recognize the objects that can be chosen as the cluster centers;

3) The derivation of a credal partition is reformulated with the help of the discovered cluster centers as an optimization problem integrated with the learned affinity matrix, completely reflecting the (dis)similarity between any two objects and enhancing SRMVEC performance.

II.RELATED WORK

Multi-View Clustering: Almost all the existing MVC algorithms are designed to improve clustering performance or efficiency after prespecifying the cluster number.Existing MVC algorithms can be generally divided into the following families [4].The first one is generative algorithms to learn the generative models generating the data from clusters, such as the multi-view CMM [37] and its weighted version [9].Another research line called discriminative algorithms proposes to optimize the objective function to seek the clustering result directly.As shown in a recent survey [4], most of the MVC algorithms are discriminative.They can be further divided into several groups, including the subspace-clusteringbased methods (e.g., PGSC [10] and CDMSC2[38]), the spectral-clustering-based methods (e.g., SMSC [12], SMSCNN[13] and CGL [39]), the NMF-based methods (e.g., MCLES[40]), the kernel-clustering-based methods (e.g., MKCSS[41]) and so on.Recently, more researchers increase attention to using deep learning (e.g., a differentiable network-based method named differentiable bi-sparse multi-view co-clustering [42] and a novel differentiable bi-level optimization network for multi-view clustering [43]) or graph learning methods (e.g., clustering multi-view data based on the contrastive consensus graph learned by a convolutional network [1]) in the MVC problem, and improving the scalability and efficiency of MVC algorithms, such as OPLFMVC [44].

Estimating the Cluster Number:The single-view clustering problem is the main focus of almost all estimation techniques.The literature has a wide variety of estimating techniques (see, for instance, [45], [46]).These techniques have the same fundamental idea.They first establish a basic clustering method and a clustering criterion, continue experimenting with various cluster numbers, and then choose the cluster numberuthat best matches the ideal clustering criterion of the target dataset.The GAP statistic [26], KL statistic [24],DUNN metric [47], etc., are some of the often utilized criteria.However, the criteria that are employed are often established using a metric between single-view objects.

Multi-View Learning Methods Based on Evidence Theory:

To the best of our knowledge, there are few evidence-theory-based methods on multi-view data focusing on finding the cluster number in the MVC problem.Evidence theory has been used in multi-view learning scenarios to improve the performance of algorithms [48].The author of [49] proposes to use evidence theory to classify the multi-view medical images and manage the partial volume effect.In [50], an architecture based on a generalized evidence processing for data fusion is presented.To fuse multi-view information, an architecture based on a weighted fuzzy evidence theory is proposed to assign evidence obtained from various classification methods [51].Differently from the above methods using evidence theory to combine information, recent papers focus more on formulating uncertainty in the learning process.In the problem of classifying multi-view samples, authors of [52]use evidence theory to model the uncertainty of the Dirichlet distribution obtained from a certain view, aiming to flexibly integrate multiple Dirichlet distributions.In the classification process, the mass values correspond to the probabilities of different classes and the overall uncertainty is also modeled.Dempster’s rule [36] is used to combine the information from each view to boost the classification performance.In [53],researchers tackle the cross-modal retrieval problem by assigning a belief mass [36], [54] to each query and an overall uncertainty mass based on the collected cross-modal.In[18], the authors propose an evidential c-means multi-view algorithm but ignore the detection of cluster number.

III.PRELIMINARIES

Evidence Theory: Let us consider a variableωtaking values in a finite set called theframe of discernmentΩ={ω1,ω2,...,ωu}.A mass functionm(also called a piece of evid∑ence) is defined as a mapping from 2Ωto [0, 1] such thatwhere 2Ωis the power set of Ω.The subsets A satisfyingmΩ(A)>0 are called thefocal setsofm.The value ofmΩ(A) represents a share of a unit mass allocated to focal set A, and which cannot be allocated to any strict subset of A.In particular, the vacuous mass function such thatmΩ(Ω)=1 corresponds to total ignorance about the value ofω.In this case, the Ω in the brackets is a focal set calledignorance focal set.

Another equivalent representation of a givenmΩ, namedplausibility function, is defined as

for all A ⊆Ω, where B also denotes the focal sets.Assumethat there are two mass functionsmΩ1andmΩ2on the same frame of discernment Ω.Dempster’s rule[36], [55] (noted as⊕) is defined as

TABLE I NOTATION TABLE

TABLE II A CREDAL PARTITION ON Ω={ω1,ω2,ω3}

IV.METHOD: SRMVEC

The SRMVEC algorithm consists of two parts: identify the cluster centers and create a credal partition for the remaining objects, which are introduced in Sections IV-A and IV-B,respectively.

A. Identifying the Cluster Centers

Basic Idea: We must first determine the “possibility” that each object is a cluster center before we can detect the number and locations of “cluster centers”.Such a “possibility” of a“cluster center” should be high, and its “separation” from other “cluster centers” should be as strong as it can be.In light of this, we can choose those objects as the “cluster centers”that have a high “possibility” and large “separation”.It is worth noting that a “cluster center” refers to the most representative object in a cluster, rather than the object spatially located around the geometric centroid of the cluster.

Specific Procedure:To mathematically propose the two definitions of “possibility” and “separation”, the affinity matrix of multi-view objects should be learned first.Following the idea that data from various views come from the same latent space, we define the learning of affinity matrix W as

Remark 1(Some explanations about(7)): We use the nonlinear function e xp(·) because it has been widely used in other literature about evidence theory (e.g., [56], [57]), and satisfies

Fig.2.An illustrative workflow of SRMVEC.The dataset has p=3 views colored green, yellow and red.The affinity matrix W ˆ is learned using the two-step iterative algorithm, and then cluster centers are selected from the upper right corner of the Sep-Pos chart.Using the gradient-based algorithm presented in Section V-B, stress function S SRMVEC(v) (12) is minimized and the credal partition MΩ is created.Concretely, the mass functions of n objects belonging to various clusters (i.e., M Ω) are determined as closely as possible to the calculated distance matrix D =(dij).Each mass m Ωi (A) is interpreted as a degree of believing the proposition “the true cluster of object xi is A”.The corresponding hard partition is also obtained using the plausibility-probability transformation defined in (5).

By combining all the|P(xi)| mass functionsusing (3),we give the following definition.

Definition 1: For eachxi, its possibility (Pos) of becoming a cluster center is defined as

We can choose the objects with highPosas the potential cluster centers based on Definition 1 and the fundamental idea of SRMVEC.Next, the “separation” between one object and others must be measured using another different metric.

Definition 2: The separation (Sep) betweenxiand other objects is defined as

ifxidoes not have the highestPosamong the dataset; otherwise,

The two semantic terms “possibility” and “separation” have now been formulated as two metrics.All the multi-view objects can be mapped to a 2-dimensionalSep-Poschart, as shown in the middle of Fig.2, by usingS epandPosas the horizontal and vertical coordinates, respectively.The cluster centers (colored red), which are clearly separated from the other objects because of the highPosandS ep, can be visually identified by humans.As a result, the cluster number is also provided.

Remark 2: Why do we use the way of evidence combination (3) instead of simple addition or multiplication to fuse the

Remark 3: Why do we use a human intervention method through theSep-Poschart instead of an automatic method to detect the cluster centers? Because the clustering task is unsupervised, there may not be a perfectly correct number of clusters in some real-world application scenarios.Users can obtain direct guidance on the clustering centers through theSep-Poschart, and can subjectively choose different numbers of clusters for different applications.In addition, as shown in Section VI-B, the correct numbers of cluster centers are always easily distinguished on theSep-Poscharts for those commonly used multi-view benchmark datasets.

B. Deriving a Credal Partition

Extension to Graph-based Multi-View Learning and Other Evidential Multi-View Learning Methods: First, we highlight that the performances of our method SRMVEC and the graphbased methods depend significantly on the learned similarity coefficients between multi-view objects.In other words, the technique of learning graphs for multi-view data can also be used in SRMVEC to learn the affinity matrix Wˆ, and the possibility of each object xibecoming a cluster center can then still be calculated according to (7) and (8).Besides, evidence theory is used in SRMVEC to both fuse information from multiple sources (fusing the degree of support provided byxjfor xito become a cluster center) and to describe the uncertainty in the learning process (e.g.,(Θ) in (7) denotes the degree of uncertainty for xito become a cluster center, and(Ω) in the credal partition MΩdenotes the uncertainty in clustering membership).This means that the way we use evidence theory can be extended to measure the uncertainty of the Dirichlet distribution for each view as done in [52] and to fuse multi-view information as done in [51].

V.OPTIMIZATION AND COMPLEXITY ANALYSIS

The optimization problem (6) and the minimization of function (12) are solved in Sections V-A and V-B, respectively.The complexity analysis of SRMVEC is shown in Section V-C.

A. Solving Problem (6)

We design the following two-step interactive algorithm.

UpdateWWith Fixedas: We first rewrite the objective functionL(W,as) of problem (6) as

B. Minimizing Function S SRMVEC(MΩ,ρ1,ρ2) in (12)

We adopt the following gradient-based method to minimize SSRMVEC(MΩ,ρ1,ρ2) with respect to MΩ, ρ1and ρ2.

Initialize the mass functions concerning theucluster centers as

If SSRMVEC(v) has increased between τ-1 andτ, all χeare decreased

andveis updated by

We summarize the SRMVEC algorithm in Algorithm 1 and provide an illustrative flow chart in Fig.2.In Algorithm 1, we divide the whole algorithm into 4 steps, i.e., 1) learning the affinity matrix Wˆ, 2) calculating theS epandPosof each object based on the obtained Wˆ, 3) selecting the cluster centers after mapping objects onto theSep-Poschart usingS epandPosas horizontal and vertical coordinates, and 4) minimizing the stress function SSRMVECto obtain the credal partition MΩ.In Fig.2, the first 3 steps are included in the “Identify the cluster centers” part and the 4th step is shown in the“Create a credal partition MΩ” part.

Algorithm 1 SRMVEC Algorithm Input: Data objects , , α and β x1,x2,...,xnδ=2 Output: Credal partition MΩ ˆW 1: Learn the affinity matrix by solving problem (6);Pos S ep 2: Calculate and for each object according to (8) and (9),respectively;3: Map the objects to the Sep-Pos chart and detect u cluster centers;SSRMVEC(MΩ,ρ1,ρ2) MΩ 4: Minimize and obtain

C. Complexity Analysis

Remark 5: Tuningαandβ.We can see from problem (6)that the setting ofαandβinfluences the learning for Wˆ.As a result, the object distribution in theSep-Poschart is in turn influenced by their setting.Users can alternately tuneαandβuntil the cluster centers can be visually recognized from other objects in theSep-Poschart, when the cluster centers are not immediately distinguishable.The tuning forαandβis instructed by the chart.The distribution of the objects in the chart is changed as a result of the tuning ofαandβ.Users can use the chart as a guide to progressively discover the cluster centers.In Section VI-C, we provide an example of how the tuning ofαandβworks.

VI.EXPERIMENTS

This section consists of 3 subsections.The comparison between SRMVEC with other SoTA MVC algorithms is shown in Section VI-A and detailed ablation studies are conducted in Section VI-B to show the contribution of each component in SRMVEC.The empirical convergence analysis, the tuning of hyperparameters and visualization of clustering results are shown in Section VI-C.The ACC [61] and NMI are used to evaluate the clustering performances.The credal partition created by SRMVEC is converted into a hard partition according to (5), and then we calculate the ACC and NMI.In SRMVEC, the power exponentδis fixed to 2,αandβvary in { 10-5,10-4,...,100}.As summarized in Table III, the used benchmark datasets include BBCSport, Reuters, COIL-20, WebKB2, Digit, 3Source, Wikipedia, Yale, Caltech101,Reuters, Animal10 and CIFAR10, which are also widely used in other related references, e.g., [17], [44], [62].The small version of Reuters is denoted as Reuterss.The experiments on 4 large datasets, i.e., {Caltech101, Reuters, Animal10,CIFAR10}, are performed on an Intel(R) Xeon(R) Gold 6230 CPU @ 2.10 GHz with 256 GB RAM, while the experiments on other datasets are performed on an Inter(R) Core(TM) i5-7300HQ @2.50 GHz CPU with 16 GB RAM.The experiments concerning the deep-leaning-algorithm are performedon a NVIDIA Tesla A100.

TABLE III BENCHMARK DATASETS.WE RUN SRMSIM INSTEAD OF SRMVEC ON THE {COIL20, YALE, CALTECH101, CIFAR10, ANIMAL10} WITH A LARGER NUMBER OF CLUSTERS, AS DISCUSSED IN REMARK 4 ON CALTECH101, ONLY THE SINGLE CLUSTERS AND Ω ARE CONSIDERED

A. Comparison Experiment

Comparing Algorithms: SRMVEC is compared with 14 SoTA MVC algorithms, including 2 subspace clustering algorithms (MSSC [63] and MLRSSC [17]), MvWECM [18]based on evidence theory, CGL [39] based on spectral clustering, MKCSS [41] based on Kernel Clustering, MCLES [40]based on non-negative matrix factorization and 8 deep-learning algorithms (AE2Net [16], DeepNMF [20], MvDSCN [64],DEMVC [65], DMJCT [66], MFLVC [60], SAMVCH [67]and SDMVC [19]).

Experimental Set-Up:We adopt the provided code of these algorithms and the hyperparameters are tuned as suggested in the original papers to generate the best results.We feed the correct number of clusters to the comparing algorithms.

In Table VII, the ACC and NMI values are reported.Compared with the 2 subspace clustering algorithms (MSSC and MLRSSC), SRMVEC has significantly better performance on most datasets.It is because that SRMVEC can learn a more precise affinity matrix than them, as discussed in the ablation study.The better performance achieved by SRMVEC compared to the CGL, MKCSS and MCLES may be due to the guidance from the detected cluster centers in the partitioning process.Because MvWECM performs a weighted average calculation of the partition matrix of each view to obtain the final partition matrix and the correlation between views is not reasonably considered, SRMVEC has better performance.One can also find SRMVEC achieves statistically higher ACC/NMI values than the 8 deep-learning algorithms (AE2Net,DeepNMF, MVDSCN, DEMVC, DMJCT, MFLVC, SAMVCH and SDMVC) on 57.2 % (110/192) cases, and SRMVEC is defeated by these deep-learning algorithms on 3.12%(6/192) cases.It suggests that the credal partition created by SRMVEC has better fault tolerance than the hard partitions,although the deep-learning algorithms may be able to obtain a more reasonable representation of multi-view data.In summary, SRMVEC is statistically superior to the other algorithms in 70.2% (118/168) and 68.5% (115/168) cases in terms of ACC and NMI, respectively.

Because the deep-learning algorithms are performed on GPU, we only report the running time of SRMVEC and the remaining comparing algorithms in Table VIII.MvWECM and SRMVEC consider focal sets with the same form when deriving a credal partition.For a fair comparison, the time spent on detecting the cluster centers through human intervention is not accounted for in SRMVEC.Compared to {MSSC,MLRSSC, MCLES, MvWECM}, SRMVEC consumes more time on the Reuters and Yale datasets with high-dimensional views due to thecomplexity.SRMVEC is more efficient with larger datasets (e.g., Animal10) that have fewer dimensions, as its complexity ofO(n2) is lower than theO(n3)complexity of other algorithms.Besides, SRMVEC requires slightly more time than the lower-complexity c-means-based MvWECM algorithm on {Yale, Reuters, CIFAR} datasets.

Comparision Result Between SRMVEC With its Simple VersionSRMsim: As shown in the caption of Table III, we run SRMsiminstead of SRMVEC on the {COIL20, Yale, Caltech101, CIFAR10, Animal10} datasets that have {20, 15, 101,10, 10} clusters, to avoid the heavy complexity with respect toO(2u)(as discussed in Remark 4).In this section, we compare SRMsimand SRMVEC on the other 7 datasets.The comparison results are shown in Fig.5.

One can see that SRMVEC has better performance than SRMsim(in terms of ACC and NMI) on all the datasets except for WebKB20 with 2 clusters, because SRMVEC degenerates to SRMsimwhen the dataset has only 2 clusters.In addition,the simplified SRMsimconsumes less time than the SRMVEC algorithm.On the Digit dataset with 10 clusters, SRMsimcosts nearly one-third of the time of SRMVEC.On the Retuers dataset, SRMVEC (1958.4 s) consumes more than 3 times the running time of SRMsim(563.2 s), but only obtains the 0.007 improvement in terms of ACC/NMI.This shows that even if we follow the strategy provided in Remark 4 to simplify the form of credal partition, i.e., retaining only the clusters with acardinality lower than 3 and the ignorance cluster, the clustering performance does not degrade significantly, but much runtime is saved.On the datasets with many clusters in real-world scenarios, SRMsimallows users to obtain a fast clustering result.

TABLE IV NUMBER OF CLUSTERS ESTIMATED BY DIFFERENT SRMVEC VARIANTS.THE INCORRECT CLUSTER NUMBERS ARE SHOWN IN PARENTHESES.THE ACCURACY SHOWN IN THE LAST COLUMN IS THE PROPORTION OF DATASETS WHERE THE CORRECT CLUSTER NUMBERS ARE FOUND

TABLE V ACC AND NMI (MEANSTD.DEVIATION) VALUES OF SRMMCOI, SRMSMP, SRMMFL, SRMSDM THAT CALCULATE THE AFFINITY MATRIX BASED ON THE METHODS PROPOSED IN [19], [58]-[60].THE BEST RESULTS ARE BOLD AND UNDERLINED

B. Ablation Studies

Benefits of the Sparse Reconstructive Procedure(6)and Evidence Combination(8): We compare the performance of estimating the cluster numbers between SRMVEC with its 14 variants.In each variant, we tune finely theαandβusing the strategy provided in Remark 5.Concretely, we consider

●2variantsSRMSSCandSRMLRreplacing thelearningof affinity matrixWˆthrough solving problem(6) withthe methods used in [63] (i.e., solving problem (1)) and [17], respectively;

● 4 vari ants where the affinity matrixis replaced with the similarity matrix calculated used in [58], [59] (denoted as SRMmco1and SRMSMP, respectively), and is calculated based on the features learned in [19] (SRMSDM) and [60] (SRMMFL);

● 2 variants SRMg1and SRMg2adopting the graph-based adjacency matrix according to the methods used in [68] and[69], respectively;

● The variant SRMnosadopting the objective function in problem (6) but ignoring the item α ‖W‖1;to fuse the information provided by xj∈P(xi);

TABLE VI ACC AND NMI (MEANSTD.DEVIATION) VALUES OF DIFFERENT SRMVEC VARIANTS.EACH OF THE VARIANTS IS RUN 20 TIMES.SRMSC AND SRMKKM DERIVE HARD PARTITIONS.SRMKFC CREATES A FUZZY PARTITION.SRMNOC INITIALIZES THE CREDAL PARTITION RANDOMLY INSTEAD OF USING THE WAY SHOWN IN (15).SRMNOR IGNORES THE LEARNING OF COEFFICIENTS ρ1 AND ρ2 IN FUNCTION (12)

● The variant SRMdimusing the strategy to control the complexity discussed in Remark 4, i.e., choosing only half of the total dimensions in thesth view to perform similarity calculation for each.

The cluster numbers estimated by different SRMVEC variants are shown in Table IV, from which one can see that SRMVEC can estimate the true number of clusters on all 12 datasets.Comparing SRMVEC with SRMSSCand SRMLR, we can find that these two variants only achieveAccuracy=66.7% andAccuracy=75% respectively whereas SRMVEC detects the correct cluster numbers on all the 12 datasets.It indicates that the affinity matrix learned from the sparse reconstructive procedure is more precise than those from the standard self-expression methods used in [17] and [63].Compared to SRMg1and SRMg2that use the graph-based adjacency matrices, SRMVEC is still the best one because it considers the related information between dimensions within each view when learning the affinity matrix.In addition, the affinity matrix learned by SRMVEC is not obtained by doing element-averaging calculation for the affinity matrix from each view, but is obtained through a joint learning process.

After ignoring the related information between dimensions in problem (6), the SRMnodonly finds the correct number of clusters on 5 datasets.Thus, considering the related information between intra-view dimensions in the learning of affinity matrix can substantially improve the performance of estimating the cluster numbers.One can also find that the ℓ1term in the objective function of problem (6) is also essential, because SRMnosonly estimates the true cluster number on 7 datasets.Focusing on SRMa, this variant also shows weaker performance than SRMVEC, indicating the critical role of view weight learning.

Besides, we can see that SRMVEC outperforms both SRM+and SRM×.It demonstrates that combining the information supplied by xj∈P(xi) using the evidence combination is preferable to utilizing the additive/multiplicative methods.Such an observation supports the finding outlined in Remark 2.In 10 datasets, the SRMdimalgorithm can identify the true number of clusters.This finding suggests that, in the majority of situations, reducing the computation of the dimension similarity matrices G(s), as demonstrated in Remark 4, does not significantly degrade the performance of estimating the cluster number.Focusing on the 4 variants {SRMmco1, SRMSMP,SRMMFL, SRMSDM} , only SRMSDMcan find the correct number of clusters on all datasets.This suggests that calculating the affinity matrix based on the methods proposed in[58]-[60] is not suitable for probing cluster centers using the definitionsSepandPos.

We next further explore whether the affinity matrix learned by the proposed sparse reconstruction method is more appropriate for creating the credal partition.The ACC and NMI values of the 4 variants are shown in Table V, where the matrix D calculated by (11) is replaced with the affinity matrices calculated in [19], [58]-[60].Note that the same initialization strategy shown in Section V-B is used in these 4 variants.As can be seen, SRMVEC has better performance than these 4 variants except for Yale and Reuters datasets.In particular,SRMVEC has higher ACC and NMI on all datasets than SRMSDMwhich has the same performance in finding cluster centers.This suggests that the affinity matrix learned through solving problem (6) is more suitable for generating a credal partition.

Benefits of Identifying the Cluster Centers and Adopting the Credal Partition Computed by Minimizing Function(12): In this experiment, we compare the performances of grouping objects between SRMVEC and its 5 variants.Treating the distance matrix D=(dij) (calculated based on (11)) and the true cluster number as input, the variants include

● SRMKFCthat derives a fuzzy partition through kernel fuzzy c-means (KFCM);

● SRMSCand SRMKKMthat derive hard partitions throughspectral clustering (SC) and kernel k-means (KKM);

TABLE VII ACC, NMI (MEANSTD.DEVIATION) AND RUNNING TIME (IN SECOND) OF DIFFERENT ALGORITHMS.THE ●/○ INDICATES WHETHER SRMVEC IS STATISTICALLY SUPERIOR/INFERIOR TO A CERTAIN COMPARING ALGORITHM BASED ON THE PAIRED T-TEST AT A 0.05 SIGNIFICANCE LEVEL.THE STATISTICS OF WIN/TIE/LOSS ARE SHOWN IN THE LAST COLUMN OF THE FIRST 2 SUB-TABLES.TO SAVE SPACE, EACH DATASET IS REPRESENTED AS THE 3 LETTERS OF THE CORRESPONDING NAME

● SRMnocthat initializes the credal partition randomly instead of using the way shown in (15), meaning that the generation of a credal partition in SRMVEC is no longer guided under the information provided by the detected cluster centers;

● SRMnorthat ignores the learning of coefficients ρ1andρ2in (12).Thus, SRMnorcan not reduce the magnitude gap between thevalue anddi jthrough linear variation.In SRMnor, the detected cluster centers are the same as those in SRMVEC.

We report the average ACC and NMI values of these algorithms in Table VI, where the best results are bold and underlined.First, SRMVEC achieves the highest ACC and NMI values on all 12 datasets.More specifically, the performance differences between SRMVEC and two hard-partitioning variants (SRMSCand SRMKKM) are more pronounced on Reuterssand Reuters, which has high-dimensional views and more multi-view objects are located in the overlapping areas.This result confirms that the credal partition improves the fault tolerance of SRMVEC, i.e., describes the ambiguity and uncer-tainty in the cluster memberships of multi-view objects more appropriately.As discussed in Section III, the credal partition allows the objects to be contained in the composite clusters rather than only in the single clusters.The benefit of using credal partition can also be demonstrated when comparing SRMVEC with the fuzzy-partitioning SRMKFC.One can find that SRMVEC always has the best performance when it is compared with SRMnoc.This demonstrates that the performance of credal partition can be directly improved under the guidance of the information in the detected cluster centers.Furthermore, the comparison between SRMVEC and SRMnoralso shows that the learning of the coefficients ρ1and ρ2is critical.This is because the magnitude difference betweendiland the conflictbecomes large after losing the linear variation provided by ρ1and ρ2, and then the gradient-based algorithm used to minimize the function (12) often falls into a local minimum.

Fig.3.An example of tuning α and β on 3Source and dataset.From the leftmost Sep-Pos charts to the rightmost Sep-Pos charts, the cluster centers become easier to be distinguished from the other objects.

C. Specific Behaviors of SRMVEC

An Example About Tuning α and β:As discussed in Remark 5, we give an example about tuning hyperparametersαandβin Fig.3.The objects in the upper-right corners of theSep-Poscharts are not easily distinguishable from the other objects in the leftmost sub-figures.The choice of cluster centers appears to be challenging and unrealistic.We then fix theβ=1×10-3and gradually turn up theαfrom 1 ×10-5.Observing theSep-Poscharts (shown in the 2nd and 3rd columns) generated by each group of {α,β}, some objects (cluster centers) are gradually separated from other objects.In a determined hyperparameter configuration, fixing one of the {α,β} and tuning the other one always allow users to estimate the correct cluster numbers.

Empirical Convergence Analysis: We first show the convergence plots of solving problem (6) in Figs.4(a.1)-4(a.4),where the |Lτ+1(W,as)-Lτ(W,as)| vaule between (τ+1)th andτth iterations is presented.As can be seen, the convergence of the two-step iterative algorithm proposed in Section V-A is illustrated by the gradual decrease in the value of|Lτ+1(W,as)-Lτ(W,as)|between two adjacent iterations.We also provide the convergence plots of minimizing SSRMVEC(MΩ,ρ1,ρ2)in Figs.4(b.1)-4(b.4), where the value of SSRMVEC(MΩ,ρ1,ρ2) in every iteration is shown.It is a pparent that the difference value ofSSRMVEC(MΩ,ρ1,ρ2)between 2 adjacent iterations gradually decreases to 0, illustrating the convergence of the gradient-based algorithm proposed in Section V-B.As the optimization procedure proceeds, the ACC of SRMVEC also generally increases.On the BBCSport, Reuters and COIL20 datasets,SSRMVEC(MΩ,ρ1,ρ2)always decreases slowly, then rapidly, then slowly again, indicating that the solution of minimizingSSRMVEC(MΩ,ρ1,ρ2)goes from the flat region to the sharp region, and then to the flat region.

Visualize the Clustering Results: In Fig.6, we visualize the clustering results on the 4 datasets using the t-SNE method[70], where the Euclidean distances between objects are replaced by the distances calculated according to (11).After visualization, the objects clearly form reasonable clusters on the 2-dimensional figures, and the selected cluster centers are also in reasonable locations.It illustrates that one cluster can indeed be represented by a representative cluster center selected according to the basic idea proposed in this paper.

VII.CONCLUSION

Fig.4.The value of |Lτ+1(W,as)-Lτ(W,as)| in each iteration (a.1)-(a.4).As the optimization algorithm proeeds, the value of |Lτ+1(W,as)-Lτ(W,as)|decreases gradually.The SSRMVEC(MΩ,ρ1,ρ2) value and ACC of SRMVEC in each iteration (b.1)-(b.4).The SSRMVEC(MΩ,ρ1,ρ2) and ACC are colored red and green.As the gradient-based algorithm proceeds, the S SRMVEC(MΩ,ρ1,ρ2) value decreases gradually and the ACC increases.

Fig.5.Comparision between SRMsim and SRMVEC in terms of ACC, NMI and running time.On the 7 datasets, SRMsim has slightly worse clustering performance than SRMVEC but consumes less running time.Each dataset is abbreviated by its first 3 letters.

This paper proposes a novel sparse reconstructive evidential clustering algorithm named SRMVEC.By using a 2-dimensional chart and human interaction, the number of clusters can be quickly determined.This may be the first attempt,as far as we are know, to estimate the number of clusters in the MVC problem.Moreover, SRMVEC improves the clustering by creating a credal split that takes into account more ambiguity and uncertainty in the assignment of multi-view objects.The sparse reconstructive technique, evidence theory,and the discovered cluster centers are proven to be advantageous for SRMVEC.The great clustering performance of SRMVEC is demonstrated by experiments on 12 benchmark datasets.When other MVC algorithms need to detect the number of clusters but have no prior knowledge of the data being studied, SRMVEC can also be used as a precursor step.

Fig.6.Clustering results visualized by using t-SNE on 2 datasets.The found cluster centers are marked by red crosses and are in suitable locations.