Restricted-Faults Identification in Folded Hypercubes under the PMC Diagnostic Model

2014-03-24TzuLiangKung

Journal of Electronic Science and Technology 2014年4期

Tzu-Liang Kung

Tzu-Liang Kung

——System-level fault identification is a key subject for maintaining the reliability of multiprocessor interconnected systems. This task requires fast and accurate inferences based on big volume of data, and the problem of fault identification in an unstructured graph has been proved to be NP-hard (non-deterministic polynomial-time hard). In this paper, we adopt the PMC diagnostic model (first proposed by Preparata, Metze, and Chien) as the foundation of point-to-point probing technology, and a system contains only restricted-faults if every of its fault-free units has at least one fault-free neighbor. Under this condition we propose an efficient method of identifying restricted-faults in the folded hypercube, which is a promising alternative to the popular hypercube topology.

Index Terms——Diagnosability, fault tolerance, PMC model, folded hypercube, reliability.

1. Introduction

The design of interconnected systems determines how one connects processors in some distributed-memory multiprocessor machine, and it has a tremendous impact upon the efficiency of computation[1],[2]. However, there is no one class of interconnected systems that is always better than all of the others, for the quality of a class of interconnected systems depends upon some properties that happen to be of most relevance to a particular scenario[3]. Essentially, a distributed-memory multiprocessor interconnected system should be able to tolerate a limited number of processor and/or link failures, and the reliability of each computing and/or storage unit is crucial to technical capabilities of a multiprocessor system because even a few malfunctions may make system service unreliable. The process of identifying all the faulty units in a system is known as system-level diagnosis. Preparata, Metze, and Chien[4]defined that a system is one-stept-diagnosable if all of its faulty units can be precisely pointed out by one application of some diagnostic process, provided that thetotal number of faulty units does not exceedt. The maximum number of faulty units that can be correctly identified is known as the one-step diagnosability of a system.

One classic approach to system-level diagnosis is the PMC diagnostic model, which was proposed by Preparataet al.[4]and has been widely addressed by many researchers[3],[5]-[14]. In particular, Dahbura and Masson[6]presented anO(N2.5) fault identification algorithm, whereNdenotes the total number of units in a system, and recently Liet al.[10]proposed a more efficient fault-identification algorithm for bijective connection graphs under the PMC diagnostic model. In general, the one-step diagnosability of a systemGis upper bounded by the minimum degree ofGintuitively because it is unlikely to determine whether or not a unit is faulty if all its neighboring units happen to be faulty simultaneously. Therefore, it has long been an intriguing issue to discover more effective measure that can better reflect fault patterns in real systems. Particularly, Laiet al.[9]proposed the so-called conditional diagnosability by requiring that for each unit in a system, all its neighboring units do not fail simultaneously; that is, every unit has at least one fault-free neighbor. Since then, the problem of determining conditional diagnosability has really received the attention of many researchers. Under the PMC diagnostic model, for example, Zhu[14]studied the conditional diagnosability of bijective connection network; Xuet al.[13]investigated the conditional diagnosability with respect to a subclass of matching composition networks. More recently, Chang and Hsieh[5]addressed the conditional diagnosability of augmented cubes; Stewart[3]developed a general technique to establish the asymptotic conditional diagnosability of an interconnection network. However, these works did not tell readers how to identify faulty units in a system.

The topological structure of an interconnected system is typically modeled as a graph, whose nodes and edges represent processors and communication links, respectively. Among the many kinds of network topologies, the binaryn-cube[15](for short, hypercube) is one of the most popular networks for parallel and distributed computation. However, a severe drawback of the hypercube is its largest diameter among the cube family. To compensate for the hypercube’s disadvantage, many researchers have tried to fashion hypercube into one with a lower diameter. One such network topology is the folded hypercube, which was first proposed by El-Amawy and Latifi[16]. In this paper, wepresent an algorithm for identifying restricted-faults from a folded hypercube interconnected system. In contrast with that condition imposed by Laiet al.[9], a graph contains only restricted-faults if every of its fault-free nodes has at least one fault-free neighbor. The definition of folded hypercubes will be presented in the next section.

The rest of this paper is organized as follows. Section 2 provides a preliminary background for system-level diagnosis with necessary graph-theory terminology. The fault identification algorithm is presented in Section 3. Finally, Section 4 concludes the paper and draws some comments about ongoing work.

2. Folded Hypercubes and the PMC Diagnostic Model

The underlying topology of a multiprocessor interconnected system is modeled as an undirected graph, whose node set and edge set represent the set of all processors and the set of all communication links between processors, respectively. Throughout this paper graphs are finite, simple, and unless specified otherwise, undirected. Some important graph-theory definitions and notations have to be introduced in advance. For those not defined here, however, we follow the standard terminology given in [17].

An undirected graphGis an ordered pair (V,E), whereVis a nonempty set, andEis a subset of {{u,v} | {u,v} is a 2-element subsets ofV}. The setVis called the node set, and the setEis called the edge set. For convenience, we denote the node set and the edge set ofGbyV(G) andE(G), respectively. Two nodes,uandv, in a graphGare adjacent if {u,v}∈E(G); equivalently,uandvare neighbors of each other. The degree of a nodevinG, denoted by degG(v), is the number of edges incident withv. The neighborhood of nodev, denoted byNG(v), is the set of nodes adjacent tov. For a setS⊂V, the notationG-Srepresents the graph obtained by removing every node inSfromGand deleting those edges incident with at least one node inS. A graphHis a subgraph of a graphGifV(H)⊆V(G) andE(H)⊆E(G). The components of a graphGare its maximal connected subgraphs. A component is trivial if it has no edges; otherwise, it is nontrivial.

For the sake of clarity, let a boldface letter u denote ann-bit binary stringun…ui…u1. For 1 ≤i≤n, we use (u)iand (u)*to denote the binary stringun…ūi…u1andūn…ūi…ū1, respectively. Moreover, we use (u)ito denote theith bituiof u. The Hamming weight of u, denoted bywH(u), is |{i| (u)i=1, 1 ≤i≤n}|. Then-dimensional folded hypercube FQncontains 2nnodes, each of which is labeled by ann-bit binary string. For the consistency of notation, its nodes are also denoted by boldface letters in the rest of this paper. Two nodes u and v of FQnare adjacent if and only ifwH(u⊕v)=1 orwH(u⊕v)=n, where ⊕ denotes the bitwise XOR (exclusive or) operation. The edge {u, (u)i} is ani-dimensional edge for anyi, 1 ≤i≤n, whereas the edge {u, (u)*} is a complementary edge. Fig. 1 illustrates FQ1, FQ2, FQ3, and FQ4. In fact, ann-dimensional hypercube, denoted byQn, has the same node set with FQn, and it can be obtained from FQnby removing all complementary edges.

Fig. 1. Illustrating folded hypercubes: (a) FQ1, (b) FQ2, (c) FQ3, and (d) FQ4.

The PMC diagnostic model[4]is based on the end-to-end probing technology. A probe is a test transaction, in which a testing unitutransmits some sequence of probing signals to a tested unitvand receives a response sequence of signals fromv. The testing unituoutputs its probing resultσ(u,v)=1 if the response received fromvmismatches the expected one; otherwise,uoutputsσ(u,v)=0. Table 1 summarizes the invalidation rule of the PMC diagnostic model.

Table 1: Invalidation rule of the PMC model

LetG= (V,E) be a graph,v∈Vbe any node, andkbe an integer greater than or equal to 1. According to [7] and [8], an extending star of orderkrooted at nodevis a subgraph ofGdenoted byΛG(v;k) = (V(v;k),E(v;k)), whereV(v;k)={v}∪{vi,j|1 ≤i≤k, 1 ≤j≤ 2} ⊆Vwith |V(v;k)| = 2k+1, andE(v;k) = { {v,vi,1}, {vi,1,vi,2} | 1 ≤i≤k}⊆E. The underlying topology ofΛG(v;k) is a tree as illustrated in Fig. 2 (a). For example, Fig. 2 (b) shows a systematic framework to generate one extending star of ordern+1 rooted at any node in then-dimensional folded hypercube FQnforn≥ 4. With any extending star as an input argument, Kunget al.[8]provided a general decision criteria of fault identification (Algorithm 1), and Theorem 1 concludes its validity.

Theorem 1.[8],[11]LetΛG(v;k) be an extending star of orderkrooted at nodevin a graphG. The procedure FID(ΛG(v;k)) (Algorithm 1) correctly identifies the fault status of nodevif the total number of faulty nodes inΛG(v;k) does not exceedk.

Fig. 2. Illustrating extending stars: (a) the underlying topology of extending starΛG(v;k) and (b)ΛFQn(u;n+1), where u can be any node of FQn.

Fig. 3. A general decision criteria of fault identification based on an extending star.

3. Restricted-Fault Identification in Folded Hypercubes

Suppose thatGis a graph whose one-step diagnosability is equal tot. Equivalently,Gis one-stept-diagnosable but not (t+1)-diagnosable. As stated in Section 1, the one-step diagnosability is upper bounded by the minimum degree. For manyt-diagnosable interconnected systems, the only case stopping them from being (t+1)-diagnosable is usually the scenario that some node happens to have no fault-free neighbor. For example, members in the cube family are so. A system is known to be stronglyt-diagnosable if it is one-stept-diagnosable and can achieve (t+1)-diagnosability, except for the case where a node’s neighbors are all faulty. Clearly, if an injured network remains connected, then every fault-free node has at least one neighboring unit. So the intrinsic nature of strongly diagnosable systems raises a more challengeable question: How to identify every faulty unit in a network system where every fault-free node has at least one fault-free neighbor? To be dedicated, the fault identification of folded hypercubes is presented below.

A graphGcontains only restricted-faults if every fault-free node ofGhas at least one fault-free neighbor. To identify restricted-faults in folded hypercube FQn, we propose a diagnostic architecture to organize probing results reported out from FQn. For the clarity of presentation, we introduce the proposed architecture first, and then we show it can be embedded in FQnforn≥ 8.

Suppose that v is any node of FQn. LetI*[n] = {1, 2,…,n, *}, where * denote a specific element inI*[n], and

Clearly, we have |V1| =n+1 and |V2| =n(n+1)/2. For any two distinct elementsi,jofI*[n], letv'i,jbe a node adjacent to ((v)i)jsuch that |{v'i,j|i,j∈I*[n],i≠j}| =n(n+1). It is noticed that ((v)i)j= ((v)j)ibutv'i,j≠v'j,i. Furthermore, letv''i,jbe a node adjacent tov'i,jsuch that |{v''i,j|i,j∈I*[n],i≠j}| =n(n+1). For convenience, let

and

For any node v of FQn, let ΔFQn(v) denote the graphsuch thatV0,V1,V2,V3, andV4are pairwise disjoint. Fig. 4 illustrates ΔFQ8(00000000), in which each node is labeled by a two digit hexadecimal digits.

Lemma 1.Let v be any node of FQn. If FQncontains ΔFQn(v) as a subgraph, thennis greater than 7.

Proof. Suppose that FQncontains ΔFQn(v) as a subgraph. Since |ΔFQn(v)|=1+(n+1)+n(n+1)/2+2n(n+1)＜2n= |V(FQn)| forn≥ 8, this lemma holds.

Suppose that u is any node ofQn. LetI[n] = {1, 2,…,n},V'0={u} ,V'1={(u)i|i∈I[n]}, andV'2= { ((u)i)j|i,j∈I[n],i≠j}.

Clearly, we have |V'1| =nand |V'2|=n(n-1)/2. For any two distinct elementsi,jofI[n], letu'i,jbe a node adjacent to ((u)i)jsuch that |{u'i,j|i,j∈I[n],i≠j}|=n(n-1). It is noticed that ((u)i)j= ((u)j)ibutu'i,j≠u'j,i. Moreover, letu''i,jbe a node adjacent tou'i,jsuch that |{u''i,j|i,j∈I[n],i≠j}| =n(n-1). For convenience, let

and

Fig. 4. An example of (v)i, ((v)i)j,v'i,j, andv''i,jfor v=00000000 in FQ8, wherei∈ {1, 2,…, 8,*} andj∈ {1, 2,…, 8,*}-{i}.

For any node u ofQn, let ΔQn(u) be the graphsuch thatV'0,V'1,V'2,V'3, andV'4are pairwise disjoint. It is easy to see that those white nodes in Fig. 4 exactly form ΔQ8(00000000), which can be produced by the branch-of-tree (BOT) procedure proposed in [12]. In other words, ΔFQ8(00000000) is obtained by appending those black nodes depicted in Fig. 4 to ΔQ8(00000000). In the same way, we can build ΔFQn(0n) for anyn≥ 8. Because FQnis node transitive, ΔFQn(v) will be built from applying any automorphism operation on ΔFQn(0n) for each node v of FQn.

We follow the methodology of local diagnosis developed in [12] to design Algorithm 2 and conclude its correctness in the next theorem.

Theorem 2.Let v be any node of FQn. Suppose that v has at least one fault-free neighbor if v itself is fault-free. Then, the procedure restricted-fault-identification (RFID) correctly identifies the fault status of v if the total number of faulty nodes in ΔFQn(v) does not exceed 2n+1. The time complexity of RFID isO(n2).

Liet al.[10]proposed an effective method HP-DIAG of identifying all possibly faulty nodes based on a given Hamiltonian path of the bijective connection graph. Because then-dimensional folded hypercube FQncontains ann-dimensional hypercubeQnas a spanning subgraph, their result remains applicable to folded hypercubes. Then, FQnRFID (Algorithm 3) can point out restricted-faults in FQn.

Theorem 3.The algorithm FQnRFID runs in linear time, provided that the total number of faulty nodes does not exceed 2n+1 and every fault-free node has at least one fault-free neighbor.

Proof.By the result proposed by Liet al.[10], we have |U|=O(n2), and the time complexity of HP-DIAG isO(N), whereN=2n. By Theorem 2, the time complexity of RFID isO(n2). Therefore, the for-loop runs inO(n2) times and spendsO(n2)×O(n2) time totally. So the time complexity of FQnRFID isO(N).

Fig. 5. The restricted-fault identification algorithm.

Fig. 6. Identifying all restricted-faults in the folded hypercube.

4. Conclusions

This paper proposes an effective and efficient restricted-fault identification algorithm, RFID, for folded hypercubes. Using it as fault identification kernel FQnFID runs in linear time. The main contribution of this paper is to bridge the partial gap between theoretical and practical ends of studies on diagnosability theory. With regard to future studies, one can envisage how to identify restricted-faults in another network, for example, crossed cubes, twisted cubes, and so on.

[1] W. J. Dally and B. Towles,Principles and Practices of Interconnection Networks, San Francisco: Morgan Kaufmann, 2004.

[2] J. Duato, S. Yalamanchili, and L. M. Ni,Interconnection Networks: An Engineering Approach, San Francisco: Morgan Kaufmann, 2003.

[3] I. A. Stewart, “A general technique to establish the asymptotic conditional diagnosability of interconnection networks,”Theoretical Computer Science, vol. 452, pp. 132-147, Sep. 2012.

[4] F. P. Preparata, G. Metze, and R. T. Chien, “On the connection assignment problem of diagnosis systems,”IEEE Trans. on Electronic Computers, vol. 16, no. 12, pp. 848-854, 1967.

[5] N.-W. Chang and S.-Y. Hsieh, “Conditional diagnosability of augmented cubes under the PMC model,”IEEE Trans. on Dependable and Secure Computing, vol. 9, no. 1, pp. 46-60, 2012.

[6] A. Dahbura and G. Masson, “AnO(N2.5) fault identification algorithm for diagnosable systems,”IEEE Trans. on Computers, vol. 33, no. 6, pp. 486-492, 1984.

[7] G.-H. Hsu and J. J. M. Tan, “A local diagnosability measure for multiprocessor systems,”IEEE Trans. on Parallel and Distributed Systems, vol. 18, no. 5, pp. 598-607, 2007.

[8] T.-L. Kung, H.-C. Chen, and J. J. M. Tan, “On the faulty sensor identification algorithm of wireless sensor networks under the PMC diagnosis model,” inProc. of the 6th Int. Conf. on Networked Computing and Advanced Information Management, Seoul, 2010, pp. 657-661.

[9] P.-L. Lai, J. J. M. Tan, C.-P. Chang, and L.-H. Hsu,“Conditional diagnosability measures for large multiprocessor systems,”IEEE Trans. on Computers, vol. 54, no. 2, pp. 165-175, 2005.

[10] T.-K. Li, C.-H. Tsai, and H.-C. Hsu, “A fast fault-identification algorithm for bijective connection graphs using the PMC model,”Information Sciences, vol. 187, pp. 291-297, Mar. 2012.

[11] C.-K. Lin, T.-L. Kung, and J. J. M. Tan, “Conditional-fault diagnosability of multiprocessor systems with an efficient local diagnosis algorithm under the PMC model,”IEEE Trans. on Parallel and Distributed Systems, vol. 22, no. 10, pp. 1669-1680, 2011.

[12] C.-K. Lin, T.-L. Kung, and J. J. M. Tan, “An algorithmic approach to conditional-fault local diagnosis of regular multiprocessor interconnected systems under the PMC model,”IEEE Trans. on Computers, vol. 62, no. 3, pp. 439-451, 2013.

[13] M. Xu, K. Thulasiraman, and X.-D. Xu, “Conditional diagnosability of matching composition networks under the PMC model,”IEEE Trans. on Circuits and Systems-II: Express Briefs, vol. 56, no. 11, pp. 875-879, 2009.

[14] Q. Zhu, “On conditional diagnosability and reliability of the BC networks,”The Journal of Supercomputing, vol. 45, no. 2, pp. 173-184, 2008.

[15] Y. Saad and M. H. Schultz, “Topological properties of hypercubes,”IEEE Trans. on Computers, vol. 37, no. 7, pp. 867-872, 1988.

[16] A. El-Amawy and S. Latifi, “Properties and performance of folded hypercubes,”IEEE Trans. on Parallel and Distributed Systems, vol. 2, no. 1, pp. 31-42, 1991.

[17] G. Chartrand and O. R. Ollermann,Applied and Algorithmic Graph Theory, New York: McGraw-Hill, 1993.

Tzu-Liang Kung was born in Kaohsiung in 1975. He received the B.S. degree in industrial administration from the National Taiwan University, Taipei in 1997, the M.S. degree in statistics from the National Chiao Tung University, Hsinchu in 2001, and the Ph.D. degree in computer science from the National Chiao Tung University, Hsinchu in 2009. From 2001 to 2004, he served as a senior engineer at the Behavior Design Corporation. He is currently an associate professor with the Department of Computer Science and Information Engineering, Asia University. His research interests include multivariate data analysis, machine translation, natural language processing, interconnected systems, fault-tolerant computing, algorithm design, and wireless networks.

Manuscript received December 12, 2013; revised March 12, 2014. This work was supported in part by the NSC under Grand No. NSC102-2221-E-468-018.

T.-L. Kung is with the Department of Computer Science and Information Engineering, Asia University, Taichung (Corresponding author e-mail: tlkung@asia.edu.tw).

Digital Object Identifier: 10.3969/j.issn.1674-862X.2014.04.016

Journal of Electronic Science and Technology

2014年4期