APP下载

Deep reinforcement learning-based optimization of lightweight task offloading for multi-user mobile edge computing

2021-12-21ZHANGWenxianDUYongwen

ZHANG Wenxian, DU Yongwen

(School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China)

Abstract: To improve the quality of computation experience for mobile devices, mobile edge computing (MEC) is a promising paradigm by providing computing capabilities in close proximity within a sliced radio access network, which supports both traditional communication and MEC services. However, this kind of intensive computing problem is a high dimensional NP hard problem, and some machine learning methods do not have a good effect on solving this problem. In this paper, the Markov decision process model is established to find the excellent task offloading scheme, which maximizes the long-term utility performance, so as to make the best offloading decision according to the queue state, energy queue state and channel quality between mobile users and BS. In order to explore the curse of high dimension in state space, a candidate network is proposed based on edge computing optimize offloading (ECOO) algorithm with the application of deep deterministic policy gradient algorithm. Through simulation experiments, it is proved that the ECOO algorithm is superior to some deep reinforcement learning algorithms in terms of energy consumption and time delay. So the ECOO is good at dealing with high dimensional problems.

Key words: multi-user mobile edge computing; task offloading; deep reinforcement learning

0 Introduction

Mobile devices nowadays are equipped with cameras, microphones and many other high quality sensors. Using these high-data rate sensory devices, it becomes possible for mobile devices to host perception related applications, such as face/gesture recognition, visual text translation and video image processing. While the popularization of intelligent devices and the emergence of 5G technology bring people comfort and convenience, they also cause many problems. For example, some mobile applications have very limited computing power when dealing with compute intensive tasks. As a result, to meet the quality of experience (QoE) of these mobile applications, the technology of mobile edge computing (MEC) has been proposed as a promising solution to bridge the gap between the limited resources on mobile devices and the ever-increasing demand of computation requested by mobile applications.

According to Cisco’s report[1], mobile data traffic will grow sevenfold over the next five years, and reach 49 exabytes per month by 2021. Edge computing comes in handy at this point, aiming to reduce data transmission distances, which eliminates bandwidth and latency issues, improves application and service performance, power and reliability, and reduces operational costs ultimately.

Task offloading solves the shortcomings of mobile devices in resource storage, computing performance and energy efficiency as a key technology of MEC. Task offloading technology not only reduces the pressure of core network, but reduces the time delay caused by transmission. It first uses the offload decision to determine which of computations to be processed locally and which to be uploaded to the edge server. The resource allocation is then calculated to determine where the final computing task will be unloaded.

In a wireless fading environment, the time varying wireless channel condition largely impacts the optimal offloading decision of a wireless powered MEC system[2]. In a multi-user scenario, a major challenge is the joint optimization of individual computing mode (i.e., offloading or local computing) and wireless resource allocation (e.g., the transmission air time divided between wireless power transfer (WPT) and offloading). Such problems are generally formulated as mixed integer programming (MIP) problems due to the existence of binary offloading variables. To tackle the MIP problems, branch and bound algorithms[3]and dynamic programming[4]have been adopted, however, they require prohibitively high computational complexity, especially for large-scale MEC networks. To reduce the computational complexity, heuristic local search[5-6]and convex relaxation[7-8]methods are proposed. However, both of them require considerable number of iterations to reach a satisfying local optimum. Hence, they are not suitable for making real-time offloading decisions in fast fading channels, as the optimization problem needs to be re-solved once the channel fading has varied significantly.

Conventional reinforcement learning (RL) algorithms cannot scale well as the number of agents increases, since the explosion of state space will make traditional tabular methods infeasible[9]. Nevertheless, by exploiting deep neural networks (DNNs) for function approximation, deep reinforcement learning (DRL) has been demonstrated to efficiently approximate Q-values of RL[10]. DRL combined with MEC enables mobile devices to learn optimal task offload decisions and energy allocation schemes based on task queue status, energy maximization long-term utility, queue status and channel quality.

This paper considers how to determine more effectively whether computing tasks need to be offloaded to edge nodes. A multi-user MEC network is established, in which each user follows a binary offloading strategy with the goal of jointly optimizing user task offloading decisions according to time-varying wireless channels. Therefore, a computational offloading framework is proposed based on DRL to realize the offloading with the lowest time delay and energy consumption. Compared with existing algorithms based on DRL, the following novel contributions are obtained.

1) Adopting an algorithm based on continuous action space on the premise of the existing decision making based on discrete action space to achieve a better power control of local execution and task offloading.

2) Each mobile user with random task arrival and time-varying wireless channel independently learns dynamic offloading strategy in the multi-user MEC system to minimize power consumption and computing cost and reduce delay.

3) Through experimental simulation, it is shown that edge computing optimize offloading (ECOO) has better simulation results than the learning performance of the traditional deep Q networks (DQN) and deep deterministic policy gradient (DDPG) algorithms in the decentralized strategy, and analyses the power delay trade-off of each user.

1 Related work

Artificial intelligence has brought great benefits to the society in the resource management of wireless network. DRL algorithm is proposed[11]to study the optimal caching and interference alignment in real time-varying wireless environment. Chen et al.[12]estimated which MEC could be used for task offloading on behalf of mobile users in the ultra-dense slice wireless access network with multiple base stations, and proposed a computational offloading algorithm strategy based on DQN to obtain the best choice of offloading project and maximize the long-term utility performance.

There are many related works that jointly model the computing mode decision problem and resource allocation problem in MEC networks as the MIP problems. For instance, a coordinate descent (CD) method was proposed[5]that searched along one variable dimension at a time. A similar heuristic search method for multi-server MEC networks was studied[6], which iteratively adjusted binary offloading decisions. Another widely adopted heuristic was through convex relaxation, e.g., by relaxing integer variables to be continuous between 0 and 1[7]or by approximating the binary constraints with quadratic constraints[8]. Nonetheless, on one hand, the solution quality of the reduced-complexity heuristics is not guaranteed. On the other hand, both search-based and convex relaxation methods require considerable number of iterations to reach a satisfying local optimum and are inapplicable for fast fading channels.

In recent years, research on multi-user computing offload in orthogonal frequency division multiple access (OFDMA) system in MEC environment[13]is concerned. Li et al.[14]proposed a partial data offloading algorithm on the basis of the research of multiuser computational shunt performed by orthogonal frequency division system to jointly optimize two subcarriers and power and apply them to the OFDMA MEC environment, and reduce latency on most mobile devices. Zhang et al.[15]studied the power and bandwidth allocated to OFDMA heterogeneous small cell networks to improve energy efficiency.

In order to deal with the problems brought by random task arrival and time-varying wireless channels, the design of dynamic joint control strategy for radio and computing resources in MEC system becomes more challenging[16-19]. The partial computational shunt of multi-user is taken into account, and the resource allocation based on TDMA and OFDM is studied, so as to minimize the weighted sum of user energy consumption[16]. A multi-input/output system is designed, and the energy consumption problem of task offloading is solved through the joint optimization of the formed multi-input/output beam and the allocation of computational resources[17]. Green MEC system for energy acquisition is studied, in which delay cost is used to solve the problems of execution delay and task failure[18]. And the power delay trade-off in a multi-user scenario is discussed[19]. In the existing works, there have only strategies focusing on centralized DRL-based algorithms for optimal computation offloading in MEC systems, and the design of decentralized DRL-based algorithms for dynamic task offloading control of a multi-user MEC system still remains unknown.

2 Problem model

As shown in Fig.1, a multi-user MEC system has been established, consisting of a MEC server, a basic station (BS) and a group of mobile users, whereNis the number of users,N={1,2,…,N}. Each user in the system needs to complete computationally intensive tasks.

Fig.1 Multi-user MEC system

The MEC server near BS is deployed to improve the QoS of users through different user requirements considering the limited computing power of each mobile device. In addition, with the increase of the number of mobile users, multi-user MEC system makes it more convenient to solve the problem of offloading the decentralized tasks of each user, reduces the system overhead between users and MEC server, and improves the scalability of traditional MEC system.

2.1 Network model

A 5G macro cell or BS cell is designed as a MEC system, and linear detection algorithm is adopted to manage multiple mobile user uplink transmission (ZF). For each time slottbelongs tot, the received signal of BS can be written as

(1)

wherep0,n(t) is the transmission power of offloading task data bit of usern;sm(t) is the complex data symbol with unit variance, andn(t) is a Gaussian noise vector (AWGN) with zero mean and varianceσ2. In order to represent the correlation of slot time of each mobile user, the Gaussian Markov block fading autoregression model[20]is adopted.

(2)

whereρnis the normalized channel correlation coefficient between time slottandt-1, ande(t) is the error vector.

(3)

Therefore, the corresponding signal to interference plus noise ratio (SINR) is

(4)

It can be verified from Eq.(4) that the SINR of each user decreases as the number of users increases. More offloading power is needed to be allocated for each user to make the problem solved.

2.2 Task model

There are two types of task relationships: sequential tasks and concurrent tasks. For sequential tasks, the completion time can be reduced by offloading tasks to the edge cloud which has a larger processing capacity than the mobile device. For concurrent tasks, in addition to taking advantage of the greater processing power of the edge cloud, the parallelism of the mobile device and the edge cloud can also be exploited.

1) Offloading sequential tasks

To reduce the completion time of a sequential task-graph, tasks are needed to be offloaded as many to the cloud as possible to exploit the edge cloud’s greater processing capacity. However, offloading a task incurs a network data transmission which needs to be less than the time saved by offloading the task, in order to reduce completion time. In some situations, the network bandwidth can be so unfavorable that offloading tasks to the edge cloud will take even longer than running the task on the mobile device. It was proven that the optimal set of tasks to be offloaded in a sequential task-graph would always be a sequence of consecutive tasks in the task-graph[21]. Taking task-graph in Fig.2 as an example, the theorem in Ref.[21] states that if there are tasks that can be offloaded to the cloud to reduce the completion time, the tasks must be consecutive, that is, the tasks fromjtokin Fig.2.

Fig.2 Offloading of sequential tasks

2) Offloading concurrent tasks

Optimizing the completion time of concurrent tasks is much more challenging. For a task-graphG(V,E), assume it begins with a fork task 0 and terminates with a merging taskn. The first fork task and the last merging task are usually local tasks, because the first task (which is called as the root task) needs to take inputs from the local device and the last task (which is called the termination task) needs to produce output back to the local device. The discussion of general task-graphs is left to the next subsection, where there are some sequential tasks before the root task, and some sequential tasks after the termination task.

To offload concurrent tasks, parallelism is maximized between the mobile device and the cloud, which is equivalent to minimizing the completion time. Take Fig.3 as an example, where there arenconcurrent tasks between the root task 0 and the terminating taskn+1. If too many tasks are offloaded to the cloud, then the mobile device must wait for the edge cloud to finish completing its share of the tasks. If too few tasks are offloaded to the edge cloud, then the application would wait for the mobile device to complete its tasks, which prolongs the completion time. The best case result of an offloading decision is where the waiting time at the termination task, between tasks executed on the edge cloud and tasks executed on the mobile device, is as small as possible. To achieve this goal, when computing task offloading, the completion time of the local tasks must be as close as possible to the completion time of the offloaded tasks plus the data transmission delays.

Fig.3 Concurrent tasks

2.3 Computation model

In this section, how each mobile user can meet the requirements of running their program by performing computing tasks locally or by offloading tasks will be discussed. Assuming all the applications are fine-grained[22],d1,m(t) means that the computation task is on the local mobile device, andd0,n(t) means that the computation task is offloaded to the edge server for execution. At the beginning of slott, the queue length of usern’s task buffer is

Bn(t+1)=[Bn(t)-(d1,n(t)+d0,n(t))]++an(t),

∀t∈Τ,

(5)

wherean(t) represents the number of task arrivals during time slott.

1) Local computation:

p1,n(t)∈[0,p1,n] is the power allocated to local execution. First of all, assuming that the number of CPU cycles required by usernto handle a task isLn, and the cycle number can be estimated by offline measurement[23]. Then the chip voltage is adjusted by using DVFS technology[24], and is the CPU frequency written on the effective switching capacitor in time slott. Therefore, the local processing int-slot can be derived as

(6)

(7)

2) Edge computation

Traditional MEC server is equipped with sufficient computing resources to solve the problems with edge computing, and all tasks offloaded to the MEC server through BS will be processed. Therefore the bit amount of offloading data of userncan be obtained according to Eq.(4) and expressed as

d0,n=τ0Wlog2(1+γn(t)),

(8)

whereWis the system bandwidth.

2.4 Energy consumption model

The total energy consumption of all computing devices during a certain execution time, including smartphones, sensors and remote servers, is mainly composed of two parts: computing energy consumptionEcompand energy consumptionEofffor uninstalling mobile devices. Energy consumption model I can be calculated by

(9)

(10)

wherealltaskiis the number of computing devices withitasks;Mjis the required CPU resources, andDRiis the total CPU resources.

2.5 Cost model

Due to the pay for the computing resources provided by the remote server, a new cost model is proposed based on the dynamic price model of the amount of remaining resources. The lower the amount of remaining resources is, the higher the price is. Under such a premise, users are more willing to choose the service node with a lower unit price as the offloading target, which helps reduce the use cost and improve the utilization rate of resources. The dynamic price model of residual resources in unit timetis

(11)

whereC1is the cost of the current equipment;U1is the interval time between the calculated expenses;R1is the univalence of the calculated resources;TMis the total computing resources of the current equipment, andL(t) is the computing resource ratio of the current equipment consumed in unit time. Since the computing resources of the local devices belong to the users themselves, no cost is needed to calculate. Therefore, the total cost of all remote devices is

(12)

3 Dynamic task offloading based on DRL

RL makes the best decisions in a particular scenario through constant interaction with the environment. Based on continuous learning, RL can be more adaptable to the environment. While RL has many advantages, it lacks scalability and is limited to low dimensions.

In order to solve the decision making difficulties encountered in reinforcement learning, DRL combines the perception ability of deep learning with the decision making ability of reinforcement learning, and solves the environmental problems of high dimensional state space and action space by relying on powerful functional approximation and the representation learning characteristics of deep neural network[25]. The overall optimization design of this paper is as shown in Fig.4.

Fig.4 Dynamic offload system

In the work, an improved DDPG algorithm is adopted[26]as shown in Fig.4, where each user independently learns the decentralized dynamic task offloading strategy after receiving the SINR and channel state information (CSI) sent by the base station, and provides computing tasks local execution and task offloading allocate power. In the DRL module, the neural network is improved: add a deep neural network and LSTM to the last layer of the actor network. Finally, join the candidate set network to ensure the optimal learning strategy. In particular, each user does not have prior knowledge of the MEC system, and therefore does not know the total number of users, that is, the value ofN. Then the ECOO algorithm is explained in detail.

3.1 DRL framework

State space: the MEC system needs to be observed comprehensively for the sake of the fully consideration of the characteristics of subtasks and server resources in the MEC system, including the channel vectors of all users and the queue length of task buffer. Whereas the real-world overhead of collecting such information on the BS and distributing them to users is enormous. It is used to reduce overhead and make the MEC system more scalable by letting each user perform operations in their own independent state.

At the beginning of slott, the queue length of each user’s data bufferBn(t) will be updated according to Eq.(4), and the SINR finally received by BS will be passed to the user as feedback, and then channel vectorhn(t) of the upcoming uplink transmission will be estimated through channel reciprocity. Therefore, the state space is defined as

Sn,t=[Bn(t),φn(t-1),hn(t)],

(13)

(14)

For the purpose of ensuring that the offloading decision can be executed on the local mobile devices or remote servers, the offloading decision of the subtask needs to considerN+M+1 computing devices merely, including 1 cloud data center,Nlocal mobile devices andMedge servers.

Action space: according to the current state of the system observed by each agentSn,t, the allocated power operationan,tthat selects local execution or task offloading for each time slottis

an,t=[P1,n(t),P0,n(t)].

(15)

Different from other traditional DRL algorithms which select from several predefined discrete power levels, the improved DDPG algorithm can be applied to optimize power allocation from the continuous action space, which can significantly compensate for the high dimensional defects of the discrete action space.

Reward function: the behavior of each agent is driven by reward. In order to learn the MEC model of energy perception dynamic calculation offloading strategy, the energy consumption is considered to be minimized within the acceptable buffer delay required to complete the task. According to Little theorem[27], the average queue length of the task buffer is proportional to the buffer delay, so the definition of the reward function that each agent receives after slottis

rn,t=-ωn,1P1,n(t)-ωn,2P0,n(t)-ωn,3Bn(t).

(16)

Among them,ωn,1,ωn,2andωn,3are all nonnegative weighted factors. By assigning different values to them, the energy consumption and buffer delay during task offloading can be adjusted dynamically. The value function of the improved ECOO starting from the initial state to maximize usernunder strategyonis

(17)

Whenγ→1, the value function can be used to estimate the wireless range undiscounted return of each agent[28], and the average calculated cost is

Cn(sn,t)=

3.2 Optimization of candidate network

The pseudo code of the ECOO algorithm proposed in this article is as follows.

ECOO algorithm

ECOO algorithm ensures the parameter difference between the current network and the target network through delayed update, so as to improve the stability of the training process. When the movement value in the training process is overestimated due to noise or error, the corresponding movement value will be overestimated inevitably in the later parameter updating process. Considering the results of multiple candidate networks comprehensively, action selection and action value evaluation are separated to ensure the optimal learning strategy[29].

Fig.5 summarizes the components and features of DRL-based mobile edge caching, namely high-dimensional state space representation, neural network structure design and long-term reward maximization.As shown in Fig.5, the offloading agent observes the environment and obtains serval raw signals, such as user requests, context information and network conditions. These signals can be assembled into a high-dimensional state input and then fed to the deep neural network. The deep neural network needs to be designed into a specific structure like a convolutional neural network or a recurrent neural network, which is able to mine useful information and output the value function or the policy. According to the output, an action, which represents the computing offloaded at the next slot, can be selected. The resulting caching performance is then observed and passed back to the offloading agent as a reward. The offloading agent uses the reward to train and improve its deep neural network model with the aim of maximizing the expected accumulated discount reward.

Fig.5 Deep reinforcement learning compute offloading process

All computing tasks are assumed to have the same size, and the computing capacity of the edge node is assumed to be enough for computingCtasks. The edge node can serve each requesttdirectly if the requested task has been computing locally. Otherwise, the edge node requests this task from the original server and updates the local compute according to the computing policy. The aim is to find the optimal computing policy maximizing the offloaded traffic, that is, the number of computing answered by the edge node. The detailed functionalities of these networks are shown in Fig.6.

Fig.6 Neural network architecture

Upon receivingst, the computing agent needs to take an action at that corresponds to whether or not to compute the currently requested content in the task, and if yes, the agent determines which local servers will be compute task.

In view of the gradual change of resources over time in MEC and the memory ability of LSTM network for long-term state, this paper proposes to combine LSTM and DDPG to deal with the time-dependent task offloading problem. The recurrent structure is used to integrate any long-term historical data to estimate the current state more accurately by replacing the last fully connected layer of the DDPG network with the LSTM layer.

As shown in the Fig.7, it is assumed that candidate network setNet=(net1,net2,…,neti,…,netn) can store networks with a total number ofnandmnetworks with a network set ofNet, and update them after meeting a fixed number of iterationsC. Network settingsNet2has (n-m) net-works that are selected for updating by comparing the reward values. When the number of network settingsNet2is less than (n-m), the current network generated by each iteration is added toNet2as a candidate network. When the number of network settingsNet2equals (n-m), the current network and all network settingsNet2in the network will train the currently selected state-action pair; ifNet2is greater than (n-m), the candidate network with the minimum reward value is replaced by the current network, otherwise the training continues.

Fig.7 ECOO algorithm training flowchart

4 Simulation

In this section, some numerical values of decentralized dynamic task offloading will be given in the MEC system, and the advantages and disadvantages of offloading decision are found out by comparing the cost, energy consumption and service delay. The implemented algorithms in large scale heterogeneous clusters include greedy Local execution first (GD-local), greedy computation first offloading (GD-offload), DQN-based dynamic offloading (DQN), DDPG-based dynamic offloading (DDPG) and ECOO.

4.1 Parameter settings

In the MEC system, the interval isτ0=1 ms. At the beginning of each iteration, each user’s channel vector is initialized tohn(0)~CN(0,h0(d0/dn)αIN), where path loss constantd0=1 m, path loss indexα=3, channel correlation indexρn=0.95, error vectore(t)~CN(0,h0(d0/d)αIN),fd,n=70 Hz. The system bandwidth is set as 1 MHz, the maximum transmitted power isP0,n=2 W, and noise power isσ2=10-9W. For local execution, assumeκ=10-27, CPU cycle required per bit isLn=500, and the maximum allowable CPU cycle frequency isFm=1.26 GHz. The maximum power required for local executionisP1,n=2 W.

In DDPG algorithm, for each agent, actor network and critic network, there are two hidden four-layer fully connected neural networks using Relu activation function, and the number of neurons in the two hidden layers is 400 and 300, respectively. In order to implement ECOO algorithm, an experience replay buffer with a size of 10 000 is set up, which can return a randomly selected small batch of experiences when querying. In order to realize the ECOO algorithm, an experience replay buffer is set up with a size of 10 000, so that the randomly selected small-batch experience can be returned when querying, and the small-batch value can be set to 64, so as to realize the optimization of candidate network. At the same time, the adaptive moment estimation (Adam) method[30]is adopted and the learning rate is 0.001, respectively, and the soft update rate of the target network ist=100. In order to better explore the best offloading decision, set[31]θ=0.15 andσ=0.12[32]in Ornstein-Uhlenbeck process to provide relevant noise, and set the buffer size of experience playback as |Bn|=2.5×105.

4.2 Multi-user simulation results

In the MEC system, there are 5 mobile users, each of which is randomly located within 100 m distance from BS, and the task arrival rate isλn=n*1.0 Mbps. In the training stage, for different task arrival rates ranging from 1 Mbps-5 Mbps. The actor and critic network will be trained with the same network architecture and hyper-parameters. To compare the performance of different policies, testing results are averaged from 2 500 episodes, respectively.

Fig.8 shows the training process of user dynamic computing offloading.

Fig.8 Training process

The reward value is the average of numerical simulation obtained in 10 times training sessions, where the task arrival rate is set asλ=3.0 Mbps. It can be observed from the experimental results that the average reward for iterations increases as the incremental number of interactions between the user agent and the MEC environment, which indicates that the ECOO algorithm can successfully learn efficient computing strategies without any prior know-ledge. In addition, the performance of strategies learned according to ECOO algorithm is always better than that of DDPG algorithm in the same scenario, manifesting that the strategies obtained based on ECOO can explore the action space more effectively than those obtained based on DDPG for continuous control problems.

It can be observed from Fig.9 that the average reward will increase as the task arrival rate grows, which indicates that the computation cost is higher power consumption and a longer buffering delay.

(a) Average reward

The GD-local can achieve good results in latency, but the performance of cost and power. This is mainly because the GD-local algorithm prefers to offloads subtasks to the local device for execution. When resources of the local device are insufficient, subtasks are gradually offloaded to upper devices. Since some subtasks can be executed locally without network transmission, the GD-local algorithm has lower network latency and network usage. In addition, the GD-offload based on all computing tasks performed at the edge server is similar to the GD-Local algorithm, which consumes a lot of energy. The main reason is that the GD-offload algorithm tends to offload subtasks to edge server clusters, which consumes a lot of energy during the transfer process. At the same time, the performance of the edge server can meet the processing requirements of more subtasks, and improve the network usage of the entire cluster.

The DQN algorithm, DDPG algorithm and ECOO algorithm all use the DRL to automatically generate the corresponding offloading strategy from the value iteration. As can be seen from the results in the Fig.9, as the task arrival rate increases, the improved DDPG algorithm outperforms the former two in terms of cost, power consumption and latency. It is because the ECOO algorithm comprehensively considers the historical parameters of the target network, and continuously updates the network parameters in real time, replacing the network with the minimum reward value, so as to keep the results always the best. At the same time as the lowest energy consumption, the buffer delay is always compromised.

Testing results for the power-delay trade off is investigated by setting different values ofw1in Fig.10.

Fig.10 Power-delay trade off

It can be inferred from the curves that, there is a trade-off between the average power consumption and the average buffering delay. Specifically, with a largerw1, the power consumption will be decreased by sacrificing the delay performance, which indicates that in practicew1can be tuned to have a minimum power consumption with a given delay constraint. It is also worth noting that for each value ofw1, the policy learned from ECOO always has better performance in terms of both power consumption and buffering delay, which demonstrates the superiority of the other based strategy for continuous power control.

(19)

The range of JFI values is [1/N,1], where JFI=1 when all users have the exact same rate. Therefore, the closer JFI is 1, the better the fairness between users will be proved. As shown in Fig.11, the JFI of the proposed ECOO is about 0.9, which is much higher than that of the pre-optimization scheme. Moreover, with the given number of cellular users, this advantage will be any more evident as the number of users increases.

Fig.11 Jain’s fairness index

5 Conclusions

A multi-user MEC system is designed, and the system sets conditions such as the random arrival of tasks and the change of wireless channels in each user with time. In order to minimize the power consumption and buffering delay, a decentralized dynamic task is designed based on the DRL uninstall algorithm, and the ECOO algorithm has been applied successfully to every mobile user offloading of autonomic learning strategy. The strategy can be observed from MEC local system according to the result of adaptive allocation of local computing tasks or missions offloading. Experimental simulations show that in a decentralized strategy, ECOO has better simulation results in terms of latency and energy consumption than traditional DQN and DDPG algorithms. By analyzing the power delay trade-off of each user, it was found that ECOO also had better results. However, the main consideration in this article is the slower channel changes caused by mobility, the system capacity computation does not take into account fast fading effects and thus will not reflect real capacity performance of the network. In the further research work, it is expected to base resource management on slow fading parameters and statistical information of the channel instead of instantaneous CSI to address the challenges caused by the inability to track fast changing wireless channels, and further optimize the ECOO algorithm.