RIS-Assisted Over-the-Air Federated Learning in Millimeter Wave MIMO Networks

2022-06-29LinHuZhibinWangHongbinZhuYongZhou

Journal of Communications and Information Networks 2022年2期

Lin Hu,Zhibin Wang,Hongbin Zhu,Yong Zhou

Abstract—In this paper,we propose a reconfigurable intelligent surface (RIS) assisted over-the-air federated learning (FL),where multiple antennas are deployed at each edge device to enable simultaneous multidimensional model transmission over a millimeter wave(mmWave) network.We conduct rigorous convergence analysis for the proposed FL system,taking into account dynamic channel fading and analog transmissions.Inspired by the convergence analysis,we propose to jointly optimize the receive digital and analog beamforming matrices at the access point,the RIS phase-shift matrix,as well as the transmit beamforming matrices at transmitting devices to minimize the transmission distortion.The optimization variable coupling and non-convex constraints make the formulated problem challenging to be solved.To this end,we develop a low-complexity Riemannian conjugate gradient (RCG)-based algorithm to solve the unit modulus constraints and decouple the optimization variables.Simulations show that the proposed RCG algorithm outperforms the successive convex approximation algorithm in terms of the learning performance.

Keywords—reconfigurable intelligent surface,federated learning,Riemannian conjugate gradient,over-the-air computation

I.INTRODUCTION

The rapid advancement of artificial intelligence (AI) is boosting the upsurging of many intelligent applications(e.g.,natural language processing,autonomous driving,and smart finance)[2,3].Distilling intelligence to support these applications heavily depends on the dispersed raw data,which usually contains personal information and should not be shared among different parties.Federated learning (FL) has the potential to distill intelligence from edge devices in a privacy-preserving manner[4].Different from the conventional distributed machine learning(ML)paradigm,FL enables the collaborative training among edge devices in an iterative manner without sharing any raw data.The server first disseminates its global model to all devices in each communication round.By taking advantage of the local computing power,edge devices perform local model updating and upload their local models for global model updating.Because of the high dimensionality of model parameters and a sheer number of edge devices,developing a fast and spectrum-efficient model aggregation strategy is critical for implementing FL over wireless networks.

Over-the-air computation (AirComp) is capable of reducing the communication latency and the bandwidth requirement for global model aggregation in the uplink[5-9].By enabling concurrent transmission over the same channel,Air-Comp achieves fast model aggregation[5].Specifically,the authors in Ref.[10]proposed to enhance the learning performance by optimizing receive beamforming as well as device selection.The authors in Ref.[11]leveraged AirComp to further reduce the communication latency,and proposed an appropriate transceiver design to diminish the mean-squared error(MSE).The authors in Ref.[12]adjusted the learning rate to reduce the computation distortion.The authors in Ref.[13]applied multiple-input multiple-output(MIMO)to reduce the transmission delay of FL.In addition,the authors in Ref.[14]optimized the device selection and transmit power to enhance the test accuracy of FL.Note that the AirComp-based FL is bottlenecked by the communication link with the worst channel condition[15,16].

In addition to AirComp,millimeter wave(mmWave)communications is another effective technique for fast model aggregation by providing abundant spectrum resources[17-20].The authors in Ref.[17]proposed a mmWave-based vehicular system that achieves high throughput.The authors in Ref.[21]optimized the computation offloading and transmit power allocation to support mobile edge computing over mmWave networks.The authors in Ref.[22] proposed a new paradigm named federated vehicular cloud to support FL for a vehicular network,and utilized mmWave communications to provide more stable performance.However,mmWave communications has the characteristics of short wavelength that lead to high propagation loss,high directionality,and sensitive to blockage.

Reconfigurable intelligent surface (RIS) is capable of addressing the limitations of AirComp and mmWave communications by reconfiguring the propagation environment.RIS is an artificial meta-surface without radio frequency (RF)chains and can adaptively reconfigure its passive reflecting elements[23,24].By optimizing the RIS phase-shifts and receive beamformer,the authors in Ref.[23] proposed to minimize the power consumption,where signal-to-interferenceplus-noise ratio(SINR)constraints were considered.In addition,the authors in Ref.[16]leveraged RIS to enhance downlink energy beamforming as well as uplink AirComp.The authors in Ref.[25]optimized the transceivers and RIS phaseshifts to minimize the AirComp distortion.Moreover,RIS has recently been adopted to mitigate the communication bottleneck of uplink model aggregation for wireless FL.The authors in Ref.[26] leveraged RIS and AirComp to enable spectralefficient model aggregation.To optimize the learning performance of FL systems,the authors in Ref.[27] proposed to minimize the sparse target function by optimizing the receive beamforming,the phase-shift,and the device selection.Besides,the authors in Ref.[1] proposed a Riemannian conjugate gradient (RCG) based algorithm to minimize the Air-Comp distortion.However,all the aforementioned studies on FL cannot be directly extended to mmWave MIMO networks,where both transmit and receive beamforming need to be optimized.

We in this paper propose an RIS-assisted over-the-air FL in a mmWave MIMO network,where multiple edge devices with multiple antennas simultaneously transmit local gradients to an edge server over mmWave channels with the assistance of an RIS.We develop a computation-efficient algorithm that jointly optimizes the transceiver design and RIS phaseshifts,with an objective to maximize the learning performance of FL.We summarize the main contributions of this paper as follows.

·We propose an RIS-assisted over-the-air FL system in an mmWave MIMO network,where multiple antennas are deployed at the edge devices to enable multi-modal transmission and in turn reduce the communication delay.Moreover,we conduct the convergence analysis for the RIS-assisted FL system in terms of the time-average of the global gradient.The analysis explicitly shows the influence of communication parameters on the learning performance.

·To minimize the upper bound of convergence,we formulate an optimization problem,while considering the constant modulus constraints of both RIS phase-shifts and analog beamformer,as well as the transmit power constraints.Due to the coupled optimization variables and the non-convex constraints,solving the formulated problem optimally is challenging.An efficient RCG algorithm and an alternating framework are developed to tackle the constant modulus constraints and decouple the optimization variables,respectively.In particular,we transform the constant modulus constraints to a kind of Riemannian manifold,which facilitates the design of the RCG algorithm to optimize the phase-shift matrix.

·We conduct extensive simulations in different communication environments to show the performance gain of the proposed RCG algorithm over several baseline algorithms in terms of the training loss and test accuracy.

The rest of the paper is organized as follows.Section II introduces the FL model,communication model,channel model,and convergence analysis.Section III presents the proposed alternating framework and the corresponding optimization algorithm.Section IV describes the simulation results.We conclude the paper in section V.

Notations:Bold capital,bold lowercase,and lowercase letters are adopted to represent matrices,vectors,and scalars,respectively.M*,MH,andMTdenote conjugate,conjugate transpose,and transpose of matrixM,respectively.‖·‖denotes thel2-norm operator.ℜ{·}is the real part of a scalar,vector,and complex matrix.E[·] denotes the expectation of a random variable.unit(w) is a function whose form is unit(w)(w1/|w1|,···,wn/|wn|)T.°represents the Hadamard product.Arepresents a set.|A|denotes the cardinality ofA.

II.SYSTEM MODEL AND PROBLEM FORMULATION

In this section,we first describe the FL model,communication model,and channel model under consideration,and then present the convergence analysis and problem formulation.

A.FL Model

According to the principle of FL,an edge server,co-located at an access point(AP),coordinatesKedge devices to cooperatively train a global model.We define the local loss function of edge devicek ∈K{1,2,···,K}as

wherew ∈RLdenotes the model parameter of dimensionL,f(w;x,y) is the sample-wise loss function given modelwfor feature vectorxand labely,andDkdenotes the local dataset of edge devicek.We assume that the global dataset,denoted asD,is the union of all local datasets and these local datasets are non-overlapping,i.e.,DandDj ∩Dk/0,∀j/k.The global loss function of the AP is

To minimizeF(w),we need to iteratively update model parameterwforT ＞0 communication rounds.Specifically,at thetth round,∀t ∈{1,2,···,T},the AP first broadcasts global modelwt.After that,edge devicekcalculates its local gradientaccording to local datasetDk,given by

Then,the AP aggregates the local gradients from edge devices to compute a global gradient as follows

Subsequently,the global model at the AP is updated as follows

whereλtdenotes the learning rate.

B.Gradient Aggregation via RIS-Assisted AirComp

We consider RIS-assisted MIMO AirComp to support FL over an mmWave communication network,as shown in Fig.1.Specifically,mmWave and AirComp,as two promising physical layer techniques,enable low-latency gradient aggregation by enabling concurrent transmission over a spectrum-rich channel.Besides,benefiting from the multiplexing gain of the MIMO technique,multiple dimensions of the local gradient can be simultaneously transmitted by each edge device within one time slot,which further reduces the communication latency.Futhermore,an RIS is deployed to bypass obstacles by reconfiguring the propagation environment.

Fig.1 RIS-assisted AirComp in an mmWave network

Suppose that the AP,as the edge server,hasNrantennas,each edge device hasNtantennas,and the RIS hasNpassive reflecting elements.Each edge device adopts fully-digital beamforming,while the AP adopts a fully-connected hybrid beamforming architecture withNrf(Nr＞Nrf)RF chains.This architecture reduces the hardware cost and improves the spectral efficiency[28].

and the variance as

After that,the AP collects the statistical data,i.e.,mean and variance,of all edge devices to average them as

which are then broadcast to all edge devices.Subsequently,the devices can normalize their transmit vectors by[29]

Since the signals at mmWave frequencies suffer from severe path loss,the direct link is assumed to be blocked by the obstacles.Hence,an RIS is deployed to construct reflective links from devices to the AP.We consider block fading and assume that the channel state information (CSI) is known at edge devices and the AP by utilizing the existing channel estimation methods[30].LetHk ∈(orG ∈)be the channel matrix between RIS and devicek(or the AP).As a result,we write the received signal at the AP as

whereis the transmit beamforming matrix at devicek,Θis the diagonal RIS phase-shift matrix,andn ∈is the additive white Gaussian noise(AWGN).Herein,θn ∈[0,2π]andβn ∈[0,1]denote the phase shift and the amplitude attenuation of reflecting elementn,respectively.For simplicity,we assume thatβn1,∀n ∈N{1,2,···,N}[15].Considering the maximum transmit powerP,we haveE≤P,∀k.

By employing the hybrid beamforming at the AP,the signal after processing is given by

whereUbb∈denotes the receive digital beamforming matrix andUrf∈denotes the receive analog beamforming matrix with|Urf(i,j)|1,∀i ∈Nr{1,2,···,Nr},∀j ∈Nrf{1,2,···,Nrf}[31].Subsequently,the AP can recover the weighted-sum gradient as

and update the global model as follows

C.Channel Model

In mmWave communication networks,we utilize the Saley-Valenzuela channel model[32,33].The uniform planar array(UPA)antenna structure is adopted for the AP,edge devices,and the RIS[33].Hence,the channel matrix is given by

whereDrandDtdenote the receive and transmit antennas,respectively.NcandNpdenote the numbers of scattering clusters and propagation paths of each scattering cluster,respectively.αclis the channel coefficient of thelth path in thecth scattering cluster.anddenote the transmit and receive array response vectors,respectively,whereandrepresent the elevation and azimuth angles of arrival(or angles of departure),respectively.

Hence,we have

whereκis the wavelength,ddenotes the spacing of the antenna,andw ∈{0,1,···,W-1}(h ∈{0,1,···,H-1}) denotes the index of the vertical(horizontal)antennas.

D.Convergence Analysis

To facilitate the convergence analysis of the proposed FL,we make the following assumptions.

Assumption 1The gradient ofF(·)is Lipschitz continuous with a non-negative parameterJ.Hence,for any model parametersw,v ∈RL,we have

Based on Assumption 1,we present an upper bound on the average norm of the global gradient as follows.

Theorem 1Suppose that the global loss function,F(·),satisfies Assumption 1.By setting 0＜λt ≡λ≤we have

wheredenotes the communication error at thetth round.

Proof.See Appendix A.

As can be observed from Theorem 1,whenT →∞,the upper bound depends on the communication error of each iteration,which can be obtained as follows.

Lemma 1The expectation of the norm of communication error∈tat thetth round is

whereMSE(ˆst[i],st[i])is given by

Proof.See Appendix B.

E.Problem Formulation

For ease of notations,we omit the round index.To reduce the impact of the transmission distortion on the convergence,we propose to minimize the MSE of AirComp at each round.This is inspired by the fact that the convergence upper bound in(17)is proportional to the MSE in(19).As a result,we need to optimize the receive digital and analog beamforming matrices at the AP,the transmit beamforming matrices at devices,and the RIS phase-shift matrix.The formulated optimization problem is

ProblemPis intractable because of the coupled optimization variablesVk,Θ,Urf,andUbb,and non-convex constantmodulus constraints (20a),(20b),and (20c).Note that the transceiver design and RIS phase-shifts collectively determine the accuracy of model aggregation and hence need to be jointly optimized.To solve problemP,we develop an alternating optimization algorithm in section III.

III.ALTERNATING OPTIMIZATION FRAMEWORK

In this section,we shall solve problemPby alternately optimizing transmit beamforming matrices{Vk},phase-shift matrixΘ,receive analog beamforming matrixUrf,and receive digital beamforming matrixUbb.

A.Transmit Beamforming Matrix Optimization

When given receive analog beamforming matrixUrf,receive digital beamforming matrixUbb,and phase-shift matrixΘ,problemPis decomposed intoKsubproblems,where the subproblem for devicekis

whereAk(Ubb)H(Urf)HGΘHk.

Problemis convex and can thus be tackled by using the Lagrange duality method[34].Specifically,the optimal transmit beamforming for devicekcan be written in a closed form,i.e.,

whereμk≥0 denotes the Lagrange multiplier.Based on the complementary slackness condition,μkcan be calculated according to the following two rules:1)if(Ak)HAkis invertible and

withμk0;2)otherwise,μk ＞0 is obtained through the binary search method such that the equality

holds.The detailed proof can be referred to Ref.[35].Intuitively,when the maximum transmit powerPis high enough,each device is able to compensate for the propagation loss and mitigate the misalignment error,i.e.,the objective value of problemPVkis zero by settingμk0.Otherwise,we should choose an appropriateμk ＞0 with the binary search method to minimize MSE as much as possible.

B.Phase-Shift Matrix Optimization

When transmit beamforming matrices(i.e.,{Vk}),and receive analog and digital beamforming matrices (i.e.,UrfandUbb)are fixed,we can rewrite problemPas

withD(Ubb)H(Urf)HGandHe,kHkVk.

Letr[Θ(1,1),Θ(2,2),...,Θ(N,N)]T.The first term of(26)can be rewritten as

whereD(i,:)refers to theith row of matrixDandHe,k(:,i)denotes theith column of matrixHe,k.Similarly,the second and third terms of(26)are respectively.

By combining(27),(28),and(29),the objective function of problemPΘbecomes

and the optimization problem is converted into

Constraints(31a)fall into the category of Riemannian manifold.As the conjugate gradient algorithm only needs to calculate the first-order derivative,its convergence speed is faster than that of the steepest descent method.As a result,we resort to using the RCG algorithm[36].

We need to compute a descent direction with a proper step size for each iteration in the RCG algorithm.To this end,we need to leverage the tangent space composed of vectors tangentially passing throughr[i],wherer[i]is the phase-shift vector obtained at theith iteration and each tangent vector represents one direction that can optimize the objective function fromr[i].Herein,the tangent space atr[i]in the complex circle manifold can be written as[37]

In order to get the steepest descent direction along the tangent space in the RCG algorithm,we get the Riemannian gradientby projecting the Euclidean gradient onto the tangent space.The Riemannian gradientis given by

In the conventional conjugate-gradient algorithm,the descent direction,can be a combination of gradient directionand pre-iteration descent directioni.e.,wheredenotes the conjugate gradient update parameter.However,since the tangent vectors on different tangent spaces cannot be added directly,descent direction ˆu[i-1]at the(i-1)th iteration in the RCG algorithm should be transferred to the tangent space of update variabler[i]before the operation,which can be written as

Then,the descent direction at theith iteration can be expressed as[36]

Subsequently,to ensure that the objective value monotonically decreases after each iteration,we leverage the Armijo line search method[36]to choose a proper step sizeAs simply combiningr[i-1]andis not guaranteed to induce a feasible solution on the manifold,the retraction is utilized to map the combined descent direction from the tangent space to the manifold,which can be achieved by

Finally,the solution to problemPrcan be obtained when the stopping criteria is satisfied and the phase-shift matrix is thus given byΘdiag(r).We summarize the proposed RCG algorithm in Algorithm 1.

C.Receive Analog Beamforming Matrix Optimization

When transmit beamforming matrices{Vk},phase-shift matrixΘ,and receive digital beamforming matrixUbbare given,the optimization problem is given by

Due to unit-modulus constraints (38a),problemPUrfis non-convex.Since the constraints,i.e.,M{|Urf(i,j)|1,∀i ∈Nr,∀j ∈Nrf},fall into the category of the Riemannian manifold[36],problemcan be solved via the RCG algorithm similar to Algorithm 1.Specifically,the Riemannian gradientis

with the Euclidean gradient

whereBkGΘHkVkandCUbb(Ubb)H.

D.Receive Digital Beamforming Matrix Optimization

After fixing transmit beamforming matrices{Vk},phaseshift matrixΘ,and receive analog beamforming matrixUrf,problemPreduces to an unconstrained convex optimization problem,i.e.,

We obtain the optimal solution of problemas follows

whereSkGΘHk.

To sum up,we present the proposed alternating optimization framework in Algorithm 2.Note that the proposed RCG algorithm can be extended to the scenario with direct links.

IV.SIMULATION RESULTS

In this section,we verify the effectiveness of the proposed optimization algorithm for RIS-assisted over-the-air FL in mmWave MIMO networks.We consider the UPA structure at the AP (or RIS) with 10 rows andXcolumns of antennas(or reflecting elements).The AP and the RIS are located at(0,0) m and (dRIS,10) m,respectively.Without loss of generality,we have dRIS0.The edge devices are randomly distributed in a circle whose center is(80,0)m and radius is 5 m.The channel gain obeys the complex Gaussian distributionC N[33],where PL(d) is the distance function.In the simulations,we set PL(d)10-3d-3.19.Unless specified otherwise,we setK10,Nc10,Np5,Nt2,Nrf10,N160,Nr50,Pt30 dBm,andσ2-90 dBm.Besides,the stopping criterionsεandζof the proposed algorithm are set to be 10-6.

A.MSE Performance

In this subsection,MSE is adopted as the metric to evaluate the performance of the proposed RCG algorithm.All simulations results in this subsection are obtained by averaging over 100 times.We consider the successive convex approximation(SCA) algorithm and the RCG algorithm with random RIS phase shifts as the baseline algorithms.The details of these algorithms are listed as follows.

· SCA:The alternating optimization framework is similar to the proposed RCG algorithm.In contrast to RCG,we leverage the SCA algorithm to compute the receive analog beamforming matrix and the RIS phase-shift matrix.The details of SCA algorithm can be found in Ref.[35].

· Random Phase Shifts:The framework of this algorithm is the same as the proposed RCG algorithm.The only difference is that the RIS phase-shifts are randomly generated.

We illustrate the MSE performance for different number of antennas at the AP in Fig.2.When the number of antennas becomes larger,the MSE of all algorithms decreases as deploying more antennas can enhance the signal strength.Besides,the proposed RCG algorithm realizes the best performance in minimizing the MSE which shows the superiority of the proposed RCG algorithm in tackling the unit-modulus constraints.In the simulations,it is thorny to find a proper set of optimization variables for the SCA algorithm to adapt to the different wireless environment,because its step size is fixed.This incurs the MSE performance degradation of the SCA algorithm.Compared to other two algorithms,the MSE performance of the random phase shifts scheme is the worst.This shows the importance of optimizing the RIS phase-shift matrix.

Fig.2 MSE versus the number of antennas at the AP

Fig.3 illustrates the influence of the number of RIS reflecting elements on the MSE performance.With a greater number of RIS reflecting elements,the MSE performance of these algorithms becomes better.This is because a greater number of RIS elements improves the channel quality.Compared to the baseline algorithms,the proposed RCG algorithm achieves the best MSE performance.According to Fig.2 and Fig.3,we can conclude that a greater number of antennas and/or RIS reflecting elements leads to a better MSE performance.

Fig.3 MSE versus the number of elements at the RIS

Fig.4 depicts the impact of the number of edge devices.Simulation results show that MSE increases as the number of edge devices becomes larger.This is because,with more edge devices,the RIS has to deal with more reflecting links.In AirComp systems,the MSE performance is determined by the link with the worst channel condition[15].The simulation results in Fig.4 justify the aforementioned conclusion.Compared to the baseline algorithms,the degraded MSE of the proposed RCG algorithm is much smaller when the number of devices is larger,which also demonstrates the superiority of the proposed algorithm.

Fig.4 MSE versus the number of devices

Finally,we study the influence of the location of RIS.According to Fig.5,we observe that all the algorithms achieve a better MSE performance when the RIS is deployed closer to the AP or the edge devices,because of the double fading effect.Considering the potential mobility of the edge devices and the massive edge devices,deploying the RIS close to the AP is a practical solution,which can improve the communication efficiency and significantly reduce the hardware cost.

Fig.5 MSE versus the distance between the AP and the RIS

B.FL Performance

We verify the effectiveness of the proposed RCG algorithm for FL performance.Specifically,we consider both MNIST and CIFAR-10 datasets.Non-independent and identically distributed (non-i.i.d.) datasets are available at edge devices.Specifically,the local dataset of each device consists of 1～2 classes of data.Besides,each device has the same number of data.For comparison,we consider a perfect aggregation algorithm,which does not suffer from any aggregation error.All results are obtained by averaging over 50 experiments in this subsection.

In the simulations,we setK30,Nt10,N150 andNr70.Fig.6 and Fig.7 illustrate the training performance(i.e.,the test accuracy and the training loss)of the FL systems over 200 communication rounds.According to Fig.6 and Fig.7,the proposed RCG algorithm’s training performance is close to the perfect aggregation algorithm and outperforms the SCA algorithm on both the MINIST and CIFAR-10 datasets.The detrimental effect of receiver noise reduces the performance of the proposed RCG algorithm.The results in Fig.6 and Fig.7 also justify the theoretical analysis in Theorem 1,i.e.,minimizing the MSE of AirComp is able to improve the training performance of FL.

Fig.6 Training loss versus the number of training rounds:(a) MNIST dataset;(b)CIFAR-10 dataset

Fig.7 Test accuracy versus the number of communication rounds:(a)MNIST dataset;(b)CIFAR-10 dataset

Fig.8 illustrates the training performance versus the number of antennas,whereNrranges from 10 to 70.Note that the training performance of the perfect aggregation algorithm is not influenced by the antenna number at the AP.Besides,Fig.8 indicates that deploying more antennas at the AP leads to better training performance.This is because a greater number of antennas can reduce the transmission distortion,which reduces the convergence upper bound and in turn improves the training performance of FL.

We illustrate the training performance of FL for different number of RIS reflecting elements in Fig.9,whereNranges from 60 to 150.With more antennas,the training performance of all the considered algorithms becomes better.This result is consistent with that in Fig.3.The improvement of MSE reduces the convergence upper bound,which in turn improves the learning performance of FL.According to Fig.8 and Fig.9,we conclude that deploying more antennas and RIS reflecting elements is able to improve the training performance of FL.

Fig.8 Training performance versus the number of antennas at the AP:(a)Training loss;(b)Test accuracy

Fig.9 Training performance versus the number of elements at the RIS:(a)Training loss;(b)Test accuracy

V.CONCLUSIONS

This paper proposed an RIS-assisted over-the-air FL in mmWave MIMO systems.To expedite the convergence and boost the training capability of the FL system,we proposed an alternating optimization framework based on the RCG algorithm to optimize the transmit beamforming,the RIS phase shifts,and the receive beamformer.Compared with the SCA algorithm,the proposed RCG algorithm achieved a better learning performance.

APPENDIX

A) Proof of Theorem 1

Based on Assumption 1,we have

Substituting (13) into (46) and setting 0＜λt ≡λ≤1/J,(46)can be rewritten as

where(a)is according to(gt)T∈t≤

Summing both sides of the inequality(47)forTrounds and taking the statistical expectation on the normalized transmit signals and receiver noise,we have