APP下载

A Matrix Approach to the Modeling and Analysis of Networked Evolutionary Games With Time Delays

2018-07-31GuodongZhaoYuzhenWangandHaitaoLi

IEEE/CAA Journal of Automatica Sinica 2018年4期

Guodong Zhao,Yuzhen Wang,and Haitao Li

Abstract—Using the semi-tensor product method,this paper investigates the modeling and analysis of networked evolutionary games(NEGs)with finite memories,and presents a number of new results.Firstly,a kind of algebraic expression is formulated for the networked evolutionary games with finite memories,based on which the behavior of the corresponding evolutionary game is analyzed.Secondly,under a proper assumption,the existence of Nash equilibrium of the given networked evolutionary games is proved and a free-type strategy sequence is designed for the convergence to the Nash equilibrium.Finally,an illustrative example is worked out to support the obtained new results.

I.INTRODUCTION

I N the last two decades,a great deal of interest has gone into the research of evolutionary games on graphs,namely,networked evolutionary games(NEGs)[1]-[4].NEGs can be applied to investigate some practical problems,where each individual just plays game with some specific players(for example,trading partners)other than all players on a network.The network,whose nodes and edges represent,respectively,players and interaction relationship among players,depicts the topological structure of the corresponding NEG.Hence,in an NEG,a player’s reward,or payoff,depends on the strategies taken by both his neighbors and himself.

One of the most important issues,among the investigation of NEGs,is to analyze each player’s behavior when the evolution proceeds.Many excellent works focus on this problem.The original work[5]considered the spatial game and studied the cooperation emergence and persistence of Prisoner’s Dilemma Game on two-dimensional lattices.The networked adaptive dynamics of Prisoner’s Dilemma Game was investigated by[6].Recently[7]has proposed a new theoretical framework for NEGs by the semi-tensor product(STP)of matrices,which was established in[8]and[9].It is worth noting that this method has been successfully applied to game theory and networked evolutionary game theory,and many essential results have been obtained.[10]gave a neat algorithm to calculate potential functions for potential games.Evolutionarily stable strategy of NEGs was considered by[11].In addition to that,[12]and[13]investigated the optimization and control for the NEGs,respectively.[14]provided a method to convert the payoff functions of the given game to logical functions with structural matrices.

It is noticed that the above results just considered the NEGs with one memory,that is,in the NEGs,each individual determines its own strategy choice of the next move only based on its neighbors’strategies at the last step.However,many economic activities imply an obvious fact that every participator can remember more than one action of its neighbors.In this case,all the players make their strategy choices in the next move according to their neighbors’strategies in the last finite steps(τ steps),where 1≤ τ< ∞.Therefore,the hypothesis that all the players in the NEGs can remember their neighbors’strategies in the past τ steps is reasonable.Without the network[15],[16]investigated the optimization and identification,separately,in evolutionary games with τmemory.Although there were some excellent results on the study of NEGs with one memory and evolutionary games with τ-memory,there are,to our best knowledge,no results available on NEGs with τ memories.

In this paper,we model the NEGs with τ memories and construct an algebraic formulation for the given systems based on τ-memory version of fictitious play process.The main contributions of this paper are as follows:1)The STP method is firstly applied to the study of the NEGs with τ memories,and a new theoretical framework is established via this method;2)A general procedure is proposed to design a free-type strategy sequence for the convergence of the Nash equilibrium,which can be easily realized with the help of MATLAB toolbox established in[8].

The remainder of the paper is organized as follows.Section II contains some necessary preliminaries on the semi-tensor product and game theory.Section III presents the main results of this paper,and Section IV gives an illustrative example to show the effectiveness of our main results,which is followed by a brief conclusion in Section V.

Notations:Rm×ndenotes the set of m×n real matrices.denotes the set of m × n nonnegative real matrices.whereis the i th column of the identity matrix In.An n×t matrix M is called a logical matrix,ifand M is briefly denoted by M= Δn[i1i2...it].Define the set of n×t logical matrices as Ln×t.C oli(L)(Rowi(L))is the i th column(row)of matrix L.For a set E,|E|denotes the number of elements in E.

II.PRELIMINARIES

In this section,we present some necessary preliminaries,which will be used in the sequel.

Definition 1[8]:The semi-tensor product of two matrices A∈Rm×nand B ∈Rp×tis defined as Awhere α=lcm(n,p)is the least common multiple of n and p,and⊗is the Kronecker product.

It is noted that the semi-tensor product is a generalization of the ordinary matrix product,and thus we can simply call it“product”and omit the symbol“x”without confusion.

Definition 2:Let M ∈ Rq×sand N ∈ Rp×s.Define the Khatri-Rao product of M and N,denoted by M∗N,as M∗N=[C ol1(M)C ol1(N)C ol2(M)Col2(N)...C ols(M) C ols(N)]∈ Rpq×s.

The semi-tensor product of matrices has the following fundamental properties.

Lemma 1[8]:1)Let X∈Rmand Y∈Rnbe two column vectors.Then,W[m,n]X Y=Y X,where W[m,n]is called the swap matrix.Especially,W[n,n]=W[n].2)(Pseudocommutative property)Let X ∈Rtand A∈Rm×n.Then,X A=(It⊗A)X holds.

Lemma 2[8]:Let X=(xi1,i2,...,ik)be a column vector with its elements arranged by the ordered multi-index I d(i1,...,ik;n1,...,nk)(Definition 2.3 on Page 23 of[8]).Then

is a column vector consisting of the same elements,arranged by the ordered multi-index I d(i1,...,it+1,it,...,ik;n1,...,nt+1,nt,...,nk).

Lemma 3[8]:Define the retrievers as

then x2=Ψl,kx holds,where is the base-k power-reducing matrix satisfying z2=Mr,kz,z∈Δk,and 0k∈Rkis a zero vector.

Lemma 5[13]:Assume X ∈ Δpand Y ∈ Δq.Define two dummy matrices,named by“front-maintaining operator(FMO)”and “rear-maintaining operator(RMO)”respectively,as

An n-ary pseudo-logical(or logical)function f(x1,x2,...,xn)is a mapping fromto R(or fromto Δm).The following result shows how to express a pseudo-logical(or logical)function into its algebraic form.

In the following,we recall some notation in game theory.

A normal finite game(N,S,P),considered in this paper,consists of three factors[17]:1)n players N={1,2,...,n};2)Player i has strategy set Si,is the set of strategy profiles,i∈N;3)Player i has its payoff function piS→R,P={p1,p2,...,pn}is the set of payoff functions,i∈N.

Definition 3[17]:In an n-player normal form finite game G={S1,...,Sn;p1,...,pn},the strategy profile(,...,)is a Nash equilibrium(NE)if,for each player i,is(at least tied for)player i’s best response to the strategies specified for the n-1 other players,that is,

for every feasible strategy si∈Si,where Siis the set of strategies of player i and piis the corresponding payoff function.

Definition 4[13]:1)A normal game with two players is called a fundamental network game(FNG),if S1=S2=S0= {1,2,...,k}and player i’s payoff function is ci=ci(x,y),where x is player 1’s strategy,y is player 2’s strategy,and i=1,2.Namely,N={1,2},S=S0×S0,and P={c1,c2};2)An FNG is symmetric,if c1(x,y)=c2(y,x),∀x,y∈ S0.

III.MAIN RESULTS

This section firstly describes the definitions of NEGs with τ memories and τ-memory version of fictitious play process.Then,based on the above definitions,a kind of algebraic expression is formulated for the given NEGs.Finally,by adding a pseudo-player to the game,a free-type strategy sequence is designed to guarantee the convergence of Nash equilibrium.

A.Modeling of NEG With τ Memories

Without loss of generality,we assume that the FNG of the given corresponding NEG is symmetric and the game can repeat in finitely.Consider a networked evolutionary game with τ memories,which consists of the following three ingredients:

1)A network:its topological structure is a connected undirected graph(N,E),where N={1,2,...,n}is the set of all players,and E={(i,j)|there exists interaction between players i and j}is the set of edges.

2)An FNG:if(i,j)∈E,then i and j play the FNG in the network with strategies xi(t)and xj(t)at time t,respectively.

3)Players’strategy updating rules:these rules can be expressed as

where xj(l)∈S0is the strategy of player j at time l,l=t- τ+1,t-τ+2,...,t,Niis the neighborhood of player i,that is,j∈Niif and only if(i,j)∈E,i∈N.Obviously,i/∈Niand j∈Ni⇔i∈Nj.

In the given NEG,at each time,player i only plays with its neighbors,and its aggregate payoff pi:Sn0→R is the sum of payoffs gained by playing with all its neighbors,i.e.,

where x-i=(x1,...,xi-1,xi+1,...,xn)and c is the payoff function of the FNG.

The evolutionary process of the game proceeds according to some specialized strategy adjustment rule,which is used to describe how a player chooses a proper strategy in the next step.The rest of this subsection depicts that how to design the strategy adjustment rule step by step.

In this paper,we consider the τ-memory version of fictitious play process[18].

For player i∈N,define the empirical frequency,qji(t),as the percentage of stages at which player i has chosen the strategy j∈S0from time t-τ+1 to time t,i.e.,

where xi(l) ∈ S0is player i’s strategy chosen at time l=t- τ+1,t-τ+2,...,t,and I{·}is indicator function.Now,define the empirical frequency vector for player i at time t as

The strategy chosen by player i at time t+1 is based on the presumption that other players are playing randomly and independently according to empirical frequency vector qj(t),where j=1,...,i-1,i+1,...,n.Under this presumption,the expected payoff function for the strategy xi∈S0of player i is

where q-i(t)=(q1(t),...,qi-1(t),qi+1(t),...,qn(t))and x-i=(x1,...,xi-1,xi+1,...,xn).In the τ-memory version of fictitious play process,player i used the expected payoff(4)to select a strategy at time t+1 from the set

which is called player i’s best response to q-i(t).Based on this,we have

It is noted that player i may have more than one best responses,that is,|E Pi(q-i(t))|>1.In this case,we define a priority for the strategy choice as follows:for x,y∈S0,x has priority over y,if and only if x>y.Then,player i chooses its strategy according to the following way:

Thus,(5)leads to a pure strategy dynamics.

In addition to the idea of setting priority,there exists another option.Namely,when|E Pi(qi(t))|>1 holds,we randomly choose one with equal probability.That is,

Thus,(6)leads to a mixed strategy dynamics.However,this paper concentrates on the study of the pure strategy dynamics.

There are two cases for the final dynamical behavior of pure strategy dynamics.One is that all players’strategies remain stationary at one strategy profile,which is called a fixed point,and the other is that several strategy profiles are chosen periodically with period s≥2,which is called a cycle with length s.

The aim of this paper is to study the algebraic formulation and Nash equilibrium of the given NEG as a pure strategy dynamics.

B.Algebraic Formulation of NEGs With τ Memories

This subsection investigates the algebraic formulation of the NEG given in Section III-A described as a pure strategy dynamics.First,we convert the dynamics of the NEG into an algebraic form.

From(5),the dynamics of the NEG with τ memories can be regarded as a logical network,based on which the dynamics can be converted into an algebraic form by using the STP.According to the τ-memory version of fictitious play process depicted in Section III-A,the strategy of player i at time t+1 is the best response against its empirical frequency vector(3)generated from its neighbors at time t.Thus,to obtain the algebraic formulation of the NEG with τ memories,one can take the following three steps:1)convert the payoff function of each player into an algebraic form;2)convert the expected payoff function(4)of each player into an algebraic form;3)identify the best strategy for every player by the algebraic form of its expected payoff function(4).

In Step 1,using the vector form of logical variables,we identify S0∼ Δk,where“∼”denotes that the strategy j∈S0is equivalent to∈ Δk,j=1,2,...,k.Then,by Lemma 1,Lemma 5,Lemma 6,and(1),the payoff function of player i can be expressed as

where Mc∈ R1×k2is the structural matrix of the FNG’s payoff function and Mi∈ R1×kn is the structural matrix of pi,xi(t)∈Δkis the strategy of player i at time t,and x-i(t)=x1(t) ··· xi-1(t) xi+1(t) ···xn(t)∈ Δkn-1.

In Step 2,using a vector form,we define

where zi(t)∈ Δkτ,z(t)∈ Δknτ,and z-i(t)∈ Δk(n-1)τ.

The following proposition gives the formulation of expected payoff function(4)for player i and constructs its corresponding structural matrix,i∈N.

Proposition 1:For∀i∈ N,let Mi∈ R1×kn be the structural matrix of the payoff function of player i.Then,the expected payoff function(4)can be rewritten as Ui(xi,q-i(t))=Miq-i(t)xi=MiC z-i(t)xi:=Mciz-i(t)xi,where Mciis the structural matrix of expected payoff function(4)for player i,

Proof:By Lemma 3 and(8),one gets

and

which denote all the strategies adopted by player j from time t-τ+1 to time t,where j=1,...,i-1,i+1,...,n.Thus,we have

and

where j=1,...,i-1,i+1,...,n.By virtue of Lemma 1,Lemma 4,(12),and(13),we obtain

Thus,by(7)and(14),(4)is rewritten as

In Step 3,we construct the structural matrices for the updating rules based on the τ-memory version of fictitious play process for all players.

By virtue of Proposition 1,for player i∈N,divideinto k(n-1)τequal blocks as

where the elements in the l th block ofre all the possible benefits of player i under other players’strategy profile

Next,we search for the best response of player i making its benefit maximum,that is, find the column index of the largest element for each block of.For all l=1,2,...,k(n-1)τ,let ξi,lbe the column index such that

If there are more than one maximum columns,one can pick out the unique column index ξi,laccording to the priority of strategy choice given in Section III-A,that is,choose the largest column index.

Letting˜Li=δk[ξi,1,ξi,2,...,ξi,k(n-1)τ],by Lemma 1 and Lemma 5,we can obtain the algebraic form of the updating rule for player i as

where i∈N.By virtue of Lemma 1,Lemma 5,Lemma 4,(8),and(15),we get

Based on the above analysis,we have the following algorithm to construct the algebraic form of NEGs with τ memories.

Algorithm 1:This algorithm contains three steps:

1)Calculate the structural matrix,Mi,of the payoff function of player i∈N by

2)Construct the structural matrix of expected payoff function of player i∈N,=MiC,where C is defined as in Proposition 1.Then,divide the matrixinto k(n-1)equal blocks as

and for all l=1,2,...,k(n-1)τ, find the column index ξi,lsuch that

3)Construct the algebraic form of NEGs with τ memories as

where

and i∈N.

Remark 1:It is important to note that the initial state z(0)∈ Δknτ,which represents the strategies adopted by all players from time-τ+1 to 0,is prior knowledge for NEGs with τ memories.

It is obvious that the matrix L can reveal all the characteristics of the given NEG completely.Thus,the dynamics of the NEG with τ memories is equivalent to the algebraic form(17).

However,unlike the definitions of logical variable in[8],

we define logical variables in this paper as(8).The following proposition shows that these two definitions are equivalent so that we can use the results in[8]to investigate the properties of L.

Proposition 2:Define logical variableswhere xij∈Δk,i=1,2,...,n,and j=1,2,...,m.Then,there exists a nonsingular matrixsuch that

holds.

Proof:We prove it by induction.When n=1 and m=1,it is obvious that z=x11=Ikz,namely,T1,1G=Ik.

Suppose(19)is true for n=s and m=v,then for n=s+1 and m=v we have

Using induction assumption for(20)and Lemma 2,we have

Similarly,for n=s and m=v+1,by Lemma 2,we have

Using induction assumption for(22),we have

Therefore,by(21)and(23),when n=s,m=v+1 and n=s+1,m=v,(19)holds.

Remark 2:With the help of Proposition 2,(8)can be turned into(18)by a proper coordinate transformation.So,the high-order logical network theory in[8]is also applicable to investigate the given NEG with algebraic formulation(17)in this paper.This is the main advantage to study the dynamical behavior of the game via the algebraic formulation.For example,using the results in[8],one can obtain all the final behaviors of NEGs with τ memories,including the fixed points and cycles.

1)The number of cycles of length s for the dynamics of NEGs with τ memories,denoted by Ns,is inductively determined by

where P(s)denotes the set of proper factors of s,the proper factor of s is a positive integer k<s satisfying s/k∈Z+,and Z+is the set of positive integers.

2)The set of elements on cycles of length s,denoted by Cs,is

where Da(L)is the set of diagonal nonzero columns of L.

C.Existence and Convergence of NE for NEGs With τ Memories

In this subsection,we design a free-type strategy sequence for a pseudo-player in the given NEG to guarantee the convergence of NE.Without loss of generality,we assume that the first player is the pseudo-player who can take strategies freely.Let x1(t)∈Δkbe the strategy of the pseudo-player at time t,and xi(t)∈Δkbe the strategy of the player i∈{2,3,...,n}at time t.

First of all,using(16),one has

where y(t)=z2(t)z3(t) ··· zn(t),v(t)=z1(t),Lv=L2∗L3∗ ···∗Ln,and y(0)∈ Δk(n-1)τ.In addition to that,by Lemma 5 and(8),v(t)satisfies

Thus,by(26)and(27),we can obtain the algebraic form of the given NEG as

The following is a basic assumption for the given NEG.

Assumption 1:In the given NEG,the FNG has an NE(k,k),namely,

Remark 3:For convenience of calculation,Assumption 1 picks out(k,k)as the FNG’s NE.There is no difficulty to generalize it to the case that the NE is not(k,k).

The following result reveals the relationship between the FNG’s NEs and the corresponding NEG’s NEs.

Theorem 1:Under Assumption 1,the corresponding NEG has an NE

Proof.With(1)and(29)in hands,since the FNG under study is symmetric,for player i,one hasThus,x∗is the given NEG’s NE.

In the following,we present a result on the fixed point of(17).

Theorem 2:Under Assumption 1,the algebraic form(17)of the given NEG has a fixed pointnamely,

and i∈N.By virtue of Theorem 1 and(30),one has

where i∈N.Thus,by(5)and(31),one has

where i∈N.Then,it is easy to see that xi(t+k)=Δkk,k>1,and

where i∈N.Because of(32),we get

where i∈N.So,by(8)and(33),it reaches

Corollary 1:Consider the given NEG with its algebraic form(28).Under Assumption 1,

holds.

Proof:From Theorem 1,substituting the fixed point v(t)into(28),we have

Finally,we study how to design a free-type strategy sequence to guarantee the convergence of the Nash equilibrium and present the following result.

Theorem 3:Consider the NEG with the algebraic form(28),and assume that Assumption 1 holds.Then,the evolutionary dynamics of the game globally converges to the NE=by a free-type strategy sequence,if there exist the integersµ>0 and 1≤α≤kµ,such that

and

Proof:Firstly,using Lemma 1 and(28),one has

However,(27)implies that{v(t)}t≥-τ+1is not a free-type sequence.Then,define u(t)=x1(t),by Lemma 1 and Lemma 4,v(µ -1) v(µ -2) ··· v(0)can be rewritten as

Thus,by virtue of(37),we rewrite(36)as

Because of(34),(41),and(38),for∀y(0)∈ Δk(n-1)τ,we have

By(35),(41)and the definition of v(t),one has

Assuming that Assumption 1 holds,using(39),(40),and Corollary 1,one can see that for∀y(0)∈ Δk(n-1)τ,

and

that is,by the definition of v(t)and y(t),it is easy to prove that the evolutionary process of the game globally converges to the NEIn addition to that,by Lemma 3,we get the free-type strategy sequence(41).

From the proof of Theorem 3,one can easily design a freetype strategy sequence for player 1 as follows.

Corollary 2:Consider the NEG with the algebraic form(28),and assume that Assumption 1 holds.If there exist integersµ > 0 and 1≤ α ≤ kµ,such that(34)and(35)hold,then the free-type strategy sequence which the pseudoplayer adopts can be designed as

IV.AN ILLUSTRATIVE EXAMPLE

In this section,we give an illustrative example to show how to use our results to investigate networked evolutionary games with τ memories.

Example 1:Consider an NEG with the following basic factors:

1)A network topological structure,denoted by(N,E),as in Fig.1,where N={1,2,3},and E={(1,2),(1,3),(2,3)};

Fig.1.The network.

2)The FNG’s payoff bi-matrix shown in Table I;

TABLE IPAYOFF BI-MATRIX

3)The evolutionary rule is τ-version of FP process depicted in Section III-A,where τ=2.

With the help of Algorithm 1,the algebraic form of the game is given as z(t+1)=Lz(t),where

Then,from(42),by(24)and(25),one can see that 1)the fixed points areand,that is,all players adopt the same strategy M or F;2)two cycles with length 3 are{,,}and{,,},namely,the strategy profile sequences({M,F,M},{M,M,F},{F,M,M})and({M,M,F},{F,M,M},{M,F,M})are two cycles adopted by the three players;3)Ns=0,s=2 and 4≤s≤64.

The rest of this section studies the strategy sequence design of the given NEG.Using Algorithm 1 and considering player 1 as a pseudo-player,we have y(t+1)=Lvv(t)y(t),where

By virtue of(43),a simple calculation shows that

Then,from Theorem 3 and(44),the evolutionary process converges to the NE{F,F,F}by the strategy sequence{u(t)=}t≥-τ+1,that is,player 1 adopts the strategy sequence{x1(t)=F}t≥-τ+1.Fig.2 demonstrates the effectiveness of the free-type strategy sequence,when we pick up x2(-1)=F,x2(0)=M,x3(-1)=M,and x3(0)=F as initial states for the given NEG.

Fig.2.The responses strategy sequence of three players of the NEG under the given free-type strategy sequence.

V.CONCLUSION

In this paper,we have investigated the modeling and analysis for a class of networked evolutionary games with finite memories based on τ-memory version of fictitious play process.The dynamics of the given NEG has been converted into an algebraic form via the semi-tensor product.A proper algorithm has been established to construct its corresponding structural matrices.A free-type strategy sequence has been designed to guarantee the NE reachable globally.The study of an illustrating example has shown that the new results presented in this paper are very effective.It is worth noting that this paper considers NEGs with τ memories as discrete-time games.We can study NEGs with τ memories as differential games in the further research just like in[19].