APP下载

Efficient Network Dismantling via Node Explosive Percolation∗

2019-07-16ShaoMengQin秦绍萌XiaoLongRen任晓龙andLinYuan吕琳媛

Communications in Theoretical Physics 2019年6期

Shao-Meng Qin(秦绍萌),Xiao-Long Ren(任晓龙),and Lin-Yuan Lü(吕琳媛)

1College of Science,Civil Aviation University of China,Tianjin 300300,China

2Computational Social Science,ETH Zürich,Clausiusstraße 50,8092 Zürich,Switzerland

3Institute of Fundamental and Frontier Sciences,University of Electronic Science and Technology of China,Chengdu 610065,China

4Alibaba Research Center for Complexity Sciences,Hangzhou Normal University,Hangzhou 310027,China

AbstractThe network dismantling problem asks the minimum separate node set of a graph whose removal will break the graph into connected components with the size not larger than the one percentage of the original graph.This problem has attracted much attention recently and a lot of algorithms have been proposed.However,most of the network dismantling algorithms mainly focus on which nodes are included in the minimum separate set but overlook how to order them for removal,which will lead to low general efficiency during the dismantling process.In this paper,we reformulate the network dismantling problem by taking the order of nodes’removal into consideration.An efficient dismantling sequence will break the network quickly during the dismantling processes.We take the belief-propagation guided decimation(BPD)dismantling algorithm,a state-of-the-art algorithm,as an example,and employ the node explosive percolation(NEP)algorithm to reorder the early part of the dismantling sequence given by the BPD.The proposed method is denoted as the NEP-BPD algorithm(NBA)here.The numerical results on Erdös-R´enyi graphs,random-regular graphs,scale-free graphs,and some real networks show the high general efficiency of NBA during the entire dismantling process.In addition,numerical computations on random graph ensembles with the size from 210to 219exhibit that the NBA is in the same complexity class with the BPD algorithm.It is clear that the NEP method we used to improve the general efficiency could also be applied to other dismantling algorithms,such as Min-Sum algorithm,equal graph partitioning algorithm and so on.

Key words:network dismantling,explosive percolation,network robustness,feedback vertex set

1 Introduction

In a hyper-connected world,[1]the functions of many complex systems,like power grid or transportation networks,depend on their structures and scales.The breaking of structural connectivity may result in functional failure.[2−4]For example,to con fine the epidemic spreading,an effective way is vaccinating the people who can dismantle the infection network as far as possible.[5−7]These scenarios equal to the network dismantling problem:For a graph G with N nodes,how to find the minimum separate set of nodes whose removal will break the graph into numerous connected components with subextensive size,which means the size of the largest connected component cannot be larger than the threshold 0.01N.[8−9]The studies on this problem are of both theoretical importance[10−12]and practical signi fi cant.[13−15]

It has been proved that the dismantling problem belongs to the NP-hard class,which means,in the general case,we cannot find the minimum separate set by a polynomial algorithm.However,many heuristic strategies have been developed based on various graph properties.For example,we can use many centrality measures to quantify the importance of nodes,[16−19]and repeatedly deleting the most important one,such as the node with the highest degree[14]or with the highest betweenness.[20]Previous study showed that comparing with the deletion of the highest degree node in the original graph,a better choice is deleting the highest degree node in the 2-core of the graph — the largest sub-graph where the nodes’residual degree are no less than 2 after repeatedly removing all leaf nodes.[21]Another development of the algorithm in the same category is to quantify the collective influence of nodes through optimum percolation on networks.[22]The removal of nodes with the highest collective influence will dismantle the network fastly.In addition,the study of the dismantling problem on weighted network showed that some of the state-of-the-art algorithms are of low efficiency,even worse than random attacking method.[9,23]

Recently,Mugisha and Zhou pointed out that the network dismantling problem relates to the spanning forest problem,which is also referred as the feedback vertex set(FVS)problem.[24]Therefore,a near-optimum solution of network dismantling problem can be found through belief-propagation guided decimation(BPD)algorithm(see Sec.2 for more details).Along similar lines,Braunstein et al.also developed a Min-Sum algorithm to decycle and dismantle networks.[8]The FVS class algorithms have proved itself in finding the near minimum separate set,but there is still improvement space during the dismantling process.Because these algorithms remove nodes in the purpose of decycling instead of network dismantling,which results in the size of the largest connected component decreasing slowly before its abrupt drop at the threshold value.

Different from only asking the minimum separate set of a dismantling problem,we also concern about the order of how they are removed from the graph,which will decide the general efficiency of a dismantling algorithm.If we attack a graph according to a dismantling sequence with high general efficiency,the largest connected component in the remaining graph will break as quickly as possible.In this paper,we propose a practicable framework to improve the general efficiency of a dismantling algorithm via the node explosive percolation(NEP)algorithm.[25−27]Take the BPD algorithm as an example,we introduce an NEPBPD algorithm,called NBA for short,which will reorder the first a few nodes of a dismantling sequence given by the BPD algorithm.Experimental results on both modeled and real networks indicate that the new method greatly improves the general efficiency of the early part of the original dismantling sequence,and does not lead to any degeneration at the later part.

This paper is organized as follows:In Sec.2,we firstly introduce two criteria to evaluate the dismantling efficiency and the general efficiency of a dismantling algorithm.Then we review the BPD and the NEP algorithms in detail.Finally,we show how to link two methods together to form the NBA algorithm.In Sec.3,we measure the ability of the NBA in improving the general efficiency numerically on modeled networks,including the Erdös-R´enyi(ER)graphs,random-regular(RR)graphs,and scale-free(SF)graphs.In order to reduce the computational complexity of NBA,we further investigate the relationship between the number of reordered node and the general efficiency of the NBA.The result of this study helps us implement the fast NBA—a compromise algorithm under limited computing resources.Finally,we apply the fast NBA in all random graph ensembles and some real networks.In the last section,we conclude the new method,summarize the main results,and discuss possible extensions of the method.

2 Method

For a graph G with N nodes,after removing ρN(0≤ρ≤1)nodes and their adjacent edges from the graph,the relative size of the largest connected component in the remaining graph is denoted as g(ρ).There are two aspects to evaluate the efficiency of a dismantling algorithm:One aspect considers the size of the minimum separate set.A wildly applied criterion is the dismantling efficiency ρc,which is defined as the smallest value of ρ where g(ρ)≤ 0.01.[8,22,24,27]The other is the removing order of the removed nodes.A good dismantling algorithm will give a dismantling sequence in which every node removal leads to the maximum damage in the remaining graph.In the present paper,we introduce the average size fraction of the largest connected components R to evaluate the general efficiency of a dismantling algorithm:[25−26]

We can see that,an algorithm with higher general efficiency algorithm gives a smaller R.

In order to highlight the advancement of the NBA compared with the BPD,we define the relative improvement of the R,which reads

where RBPDand RNBAare the value of R given by the BPD and NBA respectively.In the following subsections,we will firstly review the BPD and NEP algorithms,and then explain how to implement the NBA in detail.

2.1 The BPD Algorithm for Dismantling Problem

Dismantling algorithms based on the FVS attacking strategy,such the BPD algorithm and the Min-Sum algorithm,are composed of three main stages:Firstly, finding the minimum FVS of graph G by the minimum FVS algorithm.After all nodes in the FVS are removed from the graph,the remaining graph becomes acyclic.Secondly,continue breaking the remaining forest in the most efficient way.In the above two steps,the algorithm builds a dismantling sequence(xi)based on the order of the node removal.The first Nρcnodes in the sequence form the separate set.In the last step,in order to minimize the separate set,we can move some nodes in the separate set which will not lead to the increase of the size of the largest connected component to the end of the sequence.

In the first stage,the BPD algorithm translates the FVS problem with global constraint to a spin-glass model containing only local constraint.[28−29]For each node i,it defines a state Ai∈{0,i,j∈∂i},which means the node i is not in the FVS,the root of a tree or the child of its neighboring node j respectively.Then the long-range constraint of no loop in the graph can be replaced by a series of local constraints on edges:

where δyxis the Kronecker symbol such that δyx=1 only when x=y,otherwise δyx=0.

Therefore,only when Aiand Ajsatisfy the constraint on edge(i,j),C(i,j)(Ai,Aj)=1.If a microscopic configuration≡(A1,A2,...,AN)satisfies all constraints in G,we refer it as the legitimate configuration.If we remove all nodes with Ai=0 in a legitimate configurationfrom G,the remaining graph contains only trees and a few cycle-trees,which is a kind of subgraph with only one loop.Therefore,the set of nodes with Ai=0 in a legitimate configuration can be regarded as an FVS approximately.The energy ofcorresponding to the size of the FVS,is defined as the number of nodes with Ai=0:

If the spin-glass system follows the Boltzmann distribution,the probability of observing a legitimate configurationis

where β is the inverse temperature and Z(β)is the partition function

In order to find a ground state of this spin-glass model,the BPD algorithm follows the standard steps of the cavity method by defining a pair of messages(probability distribution)on each edge(i,j)∈G:and,whereis the marginal probability of node i taking the state A in the absence of node j.Considered the constraint Eq.(3),all messages in the graph G should ful fi ll the following belief-propagation equations:

where∂ij means the vertex set obtained by deleting vertex j from ∂i and the normalization factor zi→jis

Therefore,the marginal probability of node i taking the state 0 in G is,

The BPD algorithm canfigure out which node should take the state A=0 by repeating the following two steps until there are only trees or cycle-trees left in the remaining graph:(i)Iterate Eqs.(7)–(9)with large β on graph G to propagate these messages as far as possible and then compute q0ifor each node by Eq.(11).(ii)Delete a small fraction of nodes with the largest q0ifrom the graph.Then,continue breaking cycle-trees by attacking any one node on the cycle.In the following discussion,we use Tαto denote the tree α in the remaining forest F.

In the second stage,the F will be broken as quickly as possible.To achieve this purpose,it defines another pair of messages on each edge(i,j) ∈ F:ni→jand nj→i.ni→jsaves the size of the remaining tree containing node i after cutting edge(i,j).It is obvious that ni→j=1 if i is a leaf node.The iteration equations of these messages read

These equations can be solved easily by starting from all leaf nodes and then propagating to the whole tree gradually.

If a node i∈ Tαis attacked,Tαwill break into a few smaller trees with the size{nj→i}j∈∂i,in which the size of the largest one is denoted by

Therefore,the most efficient way of breaking Tαis to attack the node with the smallest ni.Moreover,in order to quickly decrease the size of the largest connected component,we should firstly dismantle the largest tree in F.By repeatedly removing the node with the minimum niin the largest tree,a forest with giant trees will break into many tiny trees with the size no larger than the threshold value 0.01N.The remaining forest can be broken further until all nodes are removed.

If we dismantle the graph according to the sequence(xi),after Nρcnodes are removed,there will be a mass of connected components whose size are much smaller than the threshold value.In order to find out the minimum separate set,we can recover some removed nodes back to the remaining graph in the last step,which will not lead to the increase of the size of the largest connected component.At the same time,the recovered nodes are moved to the end of the dismantling sequence.

The fact that the BPD algorithm performs good in giving a small ρcindicates its high dismantling efficiency.However,as discussed in Sec.1,this kind of algorithms,including BPD algorithm,Min-Sum algorithm,equal graph partitioning algorithm,[30]etc.,su ff er from a low general efficiency problem related to the fact that the value of g(ρ)decreases slowly before its abrupt drop at ρc.Therefore,the general efficiency of these algorithms can be further improved by reordering their dismantling sequences.

2.2 The NEP Algorithm for Dismantling Problem

different from most traditional dismantling algorithms,the NEP algorithm solves this problem using a backward thinking.It starts from a dismantled graph(i.e.,a residual graph after the removing of a specific part of nodes)and then recovers the node with its attached edges.During the recovering process,some small connected components merge to a bigger one.The NEP restores the node who can largely prevent increase of the largest connected component.Finally,the NEP gives a dismantling sequence in the reverse order of the restoring process,which reduces the size of the largest connected component as quickly as possible.

In the NEP algorithm,a score σimeasuring the ability of node i merging neighbouring connected components is defined.The node with the minimum σishould be recovered to the dismantled graph with priority.If more than one node have the minimum score,we randomly select one in them.After all removed nodes are recovered,we assemble the original graph back and generate a complete dismantling sequence.

The performance of the NEP algorithm depends on the definition of σi,which considers all neighbouring connected components of node i.For example,in the simplest way,[25]σican be defined as the size of the connected component built by restoring node i to the remaining graph.We denote the above recovering targeting algorithm as the NEP with definition 1(NEP1),namely

where Mi,jis the size of the j-th largest neighbouring connected component of node i.

Although the NEP1prevents the increase of the size of the largest connected component in the present,it may lead to an unexpected great increase when two large connected components join together in the future.In order to overcome the drawback in NEP1,we also adopt the second score definition(NEP2),[27]

where Niis the number of neighbouring connected components of node i,Mi,2is the size of the second largest neighbouring connected component of node i,ϵ is a very small positive number(ϵ≪ 1/N).Therefore,NEP2prefers to recover the node with fewer neighbouring connected components.If more than one node have the same minimum Ni,the node with the smallest Mi,2will be selected.

2.3 The NEP-BPD Algorithm

A useful property in dismantling problem is that after a part of nodes are removed from a graph,the structure of the remaining graph is merely decided by the set of the removed nodes while independent of the order of the removal process.Moreover,the NEP algorithm can reorder the if rst T elements of a dismantling sequence,which might improve the general efficiency of the first T steps without any negative impact on the rest part.Therefore,we can take this advantage of the NEP algorithm to optimize a dismantling sequence generated by the BPD algorithm according to the following steps.Firstly,we use BPD algorithm to find a dismantling sequence.Then,we split(into two parts at T:the early part(and the later part(.We remove the nodes in the early part from G and keep other nodes unchanged in the remaining graph.Next,we use NEP1or NEP2on the remaining graph to recover and reorder the nodes in the early part.At last,we obtain a complete dismantling sequence with higher general efficiency by reconnecting the reordered early part and the later part together.We use NBA1and NBA2to denote the NBA with NEP1and NEP2,respectively.

The last question in the NBA is how to find out the best point T where we split the original sequenceAccording to the criteria ρcand R,which are used to evaluate the performance of a dismantling algorithm,we should choose the split point T where one or both criteria can be optimized.However,in most cases,ρcand R cannot be minimized simultaneously.Therefore in this paper we mainly focus our attention on the general efficiency and minimizing R with due consideration of the ρc.Sometimes,merely optimizing R might lead to a certain degeneration of ρc.When dismantling efficiency is more important in some problems,we can also choose T with the minimum R on the condition of

3 Results

In this section,we evaluate the performance of the NBA on three different random graph ensembles:ER graph,RR graph,and SF graph with power-law exponent γ=3.0.In the present paper,we generate the SF networks by a static method explained in Ref.[31].Without being specific,the results in the following discussion are obtained by averaging over 16 different graph instances with N=216.Comparing with the results of the original BPD algorithm,we find that the general efficiency can be improved conspicuously by NBA.

Figure 1 presents the relative size of the largest connected component g as a function of the fraction of the removed nodes ρ,given by various algorithms on an ER graph instance with average degree ⟨k⟩=4.0 and another RR graph instance with degree k=4.We can observe that the BPD algorithm gives a very small ρBPDc=0.2162 in the ER graph and ρBPDc=0.3346 in the RR graph.For comparison,the NEP1and NEP2give ρNEP1c=0.2468,=0.2625 in the ER graph and ρNcEP1=0.3435,=0.3538 in the RR graph,respectively.Although the NEP algorithms are not good at giving a small dismantling set,they are superior to the BPD algorithm in the region ρ < ρBcPD.The NEP2has a better general efficiency than the BPD.The BPD algorithm gives RBPD=0.1852 and RBPD=0.2777 for the ER and RR graph,respectively,compared with those of the NEP2:RNEP2=0.1773 and RNEP2=0.2397.However the general efficiency R of the NBA is the smallest among all tested algorithms,which reaches RNBA1=0.1708,RNBA2=0.1611 on the ER graph and RNEP1=0.2570 and RNBA2=0.2351 on the RR graph.Note that the improvement of the general efficiency in NBA might be accompanied with the degeneration of ρc.However,in our experiments we only observe a little increase of ρc(0.03%)in the RR graph for the NBA2compared with ρBPDc.Therefore,the negative impacts of improving general efficiency in NBA is negligible.

Fig.1 (Color online)The relative size of the largest connected component g in an ER graph with average degree⟨k⟩=4.0(a)and an RR graph with k=4(b)with the fraction of removed node ρ.In these fi gures,we present the result of BPD algorithm(solid line),the NEP1and NEP2algorithm(dashed line and dotted line),and the NBA1and NBA2(dashed-dotted line and dashed-dotted-dotted line).

Fig.2 (Color online)The general efficiency R for the ER graph(a),RR graph(b)and for SF graph with power-law exponent γ =3.0(c)as the function of average degree ⟨k⟩.We present the results of BPD algorithm,NBA1,NBA2,the fast NBA1and fast NBA2.(d),(e),and(f)are the relative improvement r of these algorithms.

In order to highlight the advantage of the NBA,we investigate the value of R and r on ER,RR,and SF graph ensembles with various degree distributions systematically,and present the results in Fig.2.On all graph ensembles,the BPD gives the worst general efficiency in all tested algorithms and the NBA raises general efficiency in different degree.Although the relative improvement r decreases with growing connectivity in most cases except when we apply the NBA2sparse SF graphs(⟨k⟩<7),there still exist conspicuous improvement in the NBA.We also find that NBA2is more powerful than NBA1in improving the general efficiency except for the SF graph when ⟨k⟩<3.

Based on the description of the NBA in the above section,it seems that we should repeat the NEP algorithm N times to find the best T∈[1,N]where R reaches its minimum.In that case,the NBA will be at least N times slower than the NEP.Considered the cost of the computation time,the price of the improvement earned by NBA seems to be too expensive.In order to avoid the expanding of the computation cost,it is necessary to reduce the searching space of T.For this purpose,we study the dependence of RNBAon the number of reordered nodes and present the results in Fig.3.It shows that the value of R of NBA2decreases with growing T monotonically when T

Fig.3(Color online)The comparison of the general efficiency R between the BPD algorithm and NBA1(a)and NBA2(b)in the function of the number of reordered nodes T on ER,RR,and SF graphs with different degrees.The power-law exponent γ=3.0 for the SF graph.

The value of R and r of the fast NBA are presented in Fig.2.It is easy to understand that the general efficiency of the fast NBA is worse than that of the ordinary NBA who search the whole space T∈[0,N].In the case of NBA1,the difference of r between the fast NBA and the ordinary NBA is obvious,so it is worth searching the best T in the whole space when computation resource is abundant.On the other hand,for the NBA2,considering that R reaches its minimum before NρBPDcand searching a larger space cannot improve the general efficiency conspicuously,we can use the fast NBA to save the computation resources.

In order to fi gure out the computation complexity of the fast NBA,we investigate the computation time of the fast NBA with the size of the graph.As the fast NBA is composed of BPD and NEP stages,we present the computation time of two stages separately in Fig.4.The total computation time of the NBA is the sum of two stages.Generally speaking,for a random graph ensemble with fixed average degree,the computation complexity of the NEP stage is in the same order as that of the BPD stage.Therefore,the computation complexity of the fast NBA is also in the same order as that of the BPD algorithm.

Fig.4 (Color online)The computation time of two stages(the BPD and NEP algorithms)in the fast NBA on ER graph with ⟨k⟩=3.0 and nodes number from N=210to 219(Intel Xeon E5450,3.00 GHz,2 GB memory).

At last,we apply the fast NBA to dismantle some realworld networks,which contain plenty of communities,local loops,and hierarchical levels.We list the value of R of the BPD algorithm and that of the fast NBA1and fast NBA2in Table 1.The fast NBA improves the general efficiency of the BPD remarkably from r=8.0%(fast NBA1in the Citation network)to r=81.9%(fast NBA1in the RoadTX network).different from the case of the random graph ensembles,for the real-world networks,the fast NBA1seems to be more suitable for reordering the dismantling sequence than fast NBA2.Although the fast NBA1has a better general efficiency than fast NBA2only in 7 of 12 real-world networks,which is not an overwhelming number,the average relative improvement of the fast NBA1=30.1%is almost twice of fast NBA2=17.2%.The degree distributions of these real-world networks in Fig.5 show that these networks contain the feature of the SF graph more or less.

Fig.5 (Color online)The degree distribution of all real-world networks discussed in the Table 1.

The results in Fig.2 also reveal that the fast NBA1is better than fast NBA2in extremely sparse SF graph.The RoadEU,Grid and RoadTX networks,where the fast NBA1overwhelms fast NBA2,share a common feature that the average degree of them are very small(⟨k⟩<3)and there is no node with massive connections.We believe that is one of the reasons why fast NBA1is more fit for dismantling the three networks.The reader may argue that the average degree of the Email network is also smaller than 3,but the two NBA algorithms give similar results.The reason is both fast NBA algorithms give a very small R≈0.0008,which is also the smallest one in all real-world networks and the fast NBA1cannot behave better than fast NBA2because the dismantling sequence given by the latter is already very close to the global optimum.

Table 1 Comparative results of the BPD algorithm,the fast NBA1and fast NBA2on twelve real-world networks.The number of nodes and edges of these networks are listed in the 2nd and 3rd column.The 4th column is the average degree of these networks.The general efficiency R of the BPD algorithm,the fast NBA1and fast NBA2 are listed in the 5th,6th,and 7th column,correspondingly.We use boldface to highlight the minimum R in the three algorithms.The relative improvement of the fast NBA1and fast NBA2,denoted as r1and r2in the table,are listed in the 8th and 9th column,respectively.

4 Conclusion

Many classical dismantling strategies,including the BPD algorithm,the Min-Sum algorithm,the equal graph partitioning algorithm,etc.focus on finding the minimum separate set while ignore the general efficiency of the removing order.The purpose of this paper is to further improve the general efficiency of an existed dismantling algorithm.We take the BPD algorithm as an example and explain how to use NPE algorithm to reorder the first T nodes in the dismantling sequence given by the BPD.The BPD algorithm is good at giving a very small separate set and the NEP algorithm is more efficient during the dismantling process.The combination of the two algorithms gives birth to the NBA with the merit of both of them.Large-scale numerical computations on ER,RR,and SF graphs and twelve real-world networks reveal that although the NBA gives a dismantling threshold ρcas small as or a little larger than that of the BPD,it greatly improves the general efficiency of the BPD algorithm.An algorithm compromising the best general efficiency and limited computation resources is introduced by setting T=NρBPDc,which keeps the same computation complexity with the BPD.

Following the similar lines,we can also apply the framework of this kind of solution to other classical algorithms,such as the Min-Sum algorithm,the equal graph partitioning algorithm,etc.As long as we can find proper split points,the new algorithm will inherit multiple advantages from all ancestor algorithms and will not be in a higher complexity class than the slowest ancestor algorithm.

Acknowledgement

We thank Pan Zhang,Salomon Mugisha for helpful discussions.The computations in this paper are carried out in the HPC Cluster of ITP-CAS.