Two-Phase Rate Adaptation Strategy for Improving Real-Time Video QoE in Mobile Networks

2018-10-13AilingXiaoJieLiuYizheLiQiweiSongNingGe

China Communications 2018年10期

Ailing Xiao＊, Jie Liu Yizhe Li, Qiwei Song Ning Ge

1 Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

2 Beijing National Research Center for Information Science and Technology, Beijing 100084, China

3 Network Technology Research Institute, China Unicom, Beijing 100052, China

Abstract: With the popularity of smart handheld devices, mobile streaming video has multiplied the global network traffic in recent years. A huge concern of users’ quality of experience (QoE) has made rate adaptation methods very attractive. In this paper, we propose a two-phase rate adaptation strategy to improve users’ real-time video QoE. First, to measure and assess video QoE, we provide a continuous QoE prediction engine modeled by RNN recurrent neural network. Different from traditional QoE models which consider the QoE-aware factors separately or incompletely, our RNN-QoE model accounts for three descriptive factors (video quality, rebuffering, and rate change) and reflects the impact of cognitive memory and recency. Besides,the video playing is separated into the initial startup phase and the steady playback phase,and we takes different optimization goals for each phase: the former aims at shortening the startup delay while the latter ameliorates the video quality and the rebufferings. Simulation results have shown that RNN-QoE can follow the subjective QoE quite well, and the proposed strategy can effectively reduce the occurrence of rebufferings caused by the mismatch between the requested video rates and the fluctuated throughput and attains standout performance on real-time QoE compared with classical rate adaption methods.

Keywords: continuous quality of experience (QoE) model; recurrent neural network(RNN); real-time video QoE improving; dynamic adaptive streaming over HTTP (DASH)

I. INTRODUCTION

THE popularity of smart handheld devices has made more and more people taking online video services as the most common way to get information. Driven by the desire of high quality video services, such as video calls, video on demand(VOD), online TVs and so on, over 78% of the world’s mobile traffic will be video by 2021 [1].Due to the limited wireless bandwidth, various video characteristics, and cognitive factors, users may have different quality of experience (QoE)even with the same quality of service (QoS)conditions, which prompts the mobile operators regarding QoE as a more practical indicator of users’ service satisfaction [2-7].

To enhance the QoE-aware level and improve the real-time QoE during video playing in time, most video content are currently streamed using the dynamic adaptive streaming over HTTP (DASH) protocol [8-10]. In DASH,each video is encoded into multiple quality levels (video rates range from a typically 235kb/s standard definition to a 30Mb/s high definition).The encoding at each quality level is divided into small chunks, each chunk contains data for the same interval, though a chunk at a higher quality level is larger in size. During playback,the client (i.e. the video player) requests the proper bitrate of the next video chunk according to some rate adaptation method, which can effectively reduce the risk of rebufferings caused by the mismatch between requested video rate and fluctuated channel throughput. Besides the early goal of avoiding rebufferings, recent DASH methods [11-17] have other optimization goals like maximizing playback bitrates of segments and minimizing the variability of the selected video bitrates so as to improve the QoE in time. However, they put the selected goals on an equal footing without considering the change in time-varying users’ QoE focuses during the playback.

While DASH streamed videos are scalable and adaptable to the dynamic available network bandwidth, it shows a time-varying video quality to the users. Therefore, instead of the retrospective QoE prediction, which generates a single objective score summarizing the overall QoE of an entire viewed video, users’QoE should be evaluated on a continuous time scale. However, the existing continuous QoE models [18-21] have not considered the influence factors (including video quality, dynamic rate change in the video quality, rebuffering events) combinedly or completely. Especially, the degradation on QoE caused by the time-varying video quality is non-negligible for streaming videos, which consists a key part to the accumulative effects of the unpleasant video experience.

In this paper, we propose a two-phase rate adaptation strategy at the client side for improving real-time video QoE, which fully utilizes the time-varying users’ focuses on QoE to implement a targeted rate adaptation for improving real-time video QoE. Our main contributions include the following:

First, we build a video QoE database(v-QoE) which stores the related QoE data of 20 different video content. Since a realistic viewing process may contain combination of uncertain startup delay, several times of rebufferings, and fluctuated video rates, we jointly introduce the three kinds of distortions to each of the 20 original videos at different levels to simulate the situation encountered in the real playing. A subjective evaluation app is developed and installed on smart phones to measure and record users’ QoE on a continuous time scale. The raw data is cleaned with simple rules to exclude illogical scores and the cleaned data will be used in the training of the following QoE prediction model.

We propose a twophase rate adaptation strategy at the client side for improving real-time video QoE,which fully utilizes the time-varying users’focuses on QoE to implement a targeted rate adaptation for improving real-time video QoE.

Second, we provide a continuous QoE prediction engine modeled by recurrent neural network (RNN). Three external inputs are considered including the quality of the video chunk viewing by the users, the rebuffering traces, and the perceptual annoyance caused by unsmooth rate adaptations. Since RNN has the propagation mechanism of the previous results, each QoE prediction embodies not only the real-time influence of external inputs also the cumulative effects of previous external inputs. The training of the RNN-based QoE model (RNN-QoE)is supported by the data stored in v-QoE. The continuous QoE prediction will act as an essential performance indicator in the comparison of rate adaptation methods.

Third, we propose a two-phase rate adaptation (2-PRA) strategy to improve the real-time video QoE based on the buffer occupancy. We separate the video playing into two phases: the initial startup phase when the buffer is growing from empty, and the steady playback phase when the buffer has been built up. Considering the different characteristics of the two phases,we take shortening the initial delay as the main optimization goal for the former phase and improve the video rate and rebuffering for the latter. Simulation results have shown that compared with the baseline DASH schedule,the classical buffer based adaptation (BBA)approach, and the buffer-driven resource management (BDRM) method, our 2-PRA strategy can improve the QoE influencing factors rationally during different phases in video playing and effectively facilitate the QoE optimization process in terms of startup delay, rebuffering rates and video rate changes.

The rest of this paper is organized as fol-lows. Section II reviews the related work. Section III presents the description of the v-QoE database and provides our continuous RNNQoE prediction model. Section IV shows the intuition and design of our 2-PRA strategy.Simulation results are discussed in Section V and this paper is concluded in section VI.

II. RELATED WORK

2.1 Continuous QoE prediction models

Though not received much attention, continuous QoE prediction is a challenging problem that requires accounting for the temporal effects of subjective QoE. In [18], continuous-time QoE is predicted using a Hammerstein-Wiener dynamic model taking dynamic rate changes as the only influencing factor of QoE. In [19], a modified version of [18] is provided that accounts for both time-varying quality and rebuffering events. In [20], the proposed QoE evaluation framework divides the video playing into alternated playback periods and rebuffering periods, and respectively models the QoE of the two parts by a support vector regression and an exponential depreciation. In [21], perceptual video quality assessment (VQA) responses and rebuffering measurements are combined within a nonlinear autoregressive model with exogenous variables (NARX) for continuous QoE prediction. In [22], a novel objective factor called image damage accumulation (IDA) is proposed, and an objective QoE prediction model for high-definition video stream service is proposed which evaluated users’ experience according to IDA. However, the continuous QoE models proposed thus far have not considered the influence factors (e.g., video quality, dynamic rate change in the video quality,rebufferings including the startup delay and the rebufferings occurred in the subsequent playback) combinedly or completely. Especially, the QoE degradation caused by the time-varying video quality is nonnegligible for streaming videos, which consists a key part to the accumulative effects of the unpleasant video experience.

In this paper, we propose a continuous QoE prediction engine for streaming videos modeled by recurrent neural network (RNN)which is afflicted by video quality, dynamic rate changes, and stalling events and contains the hysteresis effects of the human visual response.

2.2 Buffer based rate adaption methods

Video rate adaptation (RA) aims at selecting the “proper” rate for each chunk to maximize users’ QoE. Obviously, the best viewing experience calls for the highest video quality(quantified by video rates), the least video playback interruptions (rebufferings), and the gentlest rate changes.

Existing RA methods can be clustered into network capacity based ones and buffer based ones. The former adapts video rates based on network capacity estimation which is hard to be accurate. Overestimation and underestimation of the available network capacity will cause unnecessary rebufferings and poor video quality, respectively. The buffer based strategies leverage the buffer occupancy as much as possible to guide the rate adaptation process.If the buffer approaches its full size, more aggressive policy is taken to fetch higher quality chunks, or on the contrary, video rate of next chunks is selected more conservatively. In[11], though video rate adaptation is based on the network capacity estimation, they take the client’s buffer level as a feedback. In [12],two separate phases of operation are designed,when the buffer just starts to growing from empty and encodes inadequate information about available network capacity, rate adaptation relies on network estimation, and when the buffer has been built up, rate adaptation is based only on the playback buffer (BBA).In [13], BBA is improved by BOLA using Lyapunov optimization techniques which considers both the rebuffering and the video quality. In [14], rate adaptation is formulated as a Markov Decision Process to maximize the video quality, which involves the network condition, the buffer state and especially the case of users’ early quitting. In [15], rate adaptation is carried out as a joint effort of both the client side and the base station side. In addition to the client’s regular judgment on rate change, the base station schedules the wireless resource and decides the final rate pulled from the server. In [16], rate adaptation is achieved by strategically applying a PID controller in an explicit and adaptive manner. In [17], the problem of joint user association and rate allocation to maximize the system utility while satisfying the requirement of users’ QoE is studied. Existing DASH methods put the selected goals on an equal footing without considering the change in time-varying users’QoE focuses during the playback.

In this paper, we’d like to emphasize that the key QoE-aware factors do not stay the same during video playing. Despite maximizing the video quality, the key QoE-aware factor in the startup phase is the startup delay,while rebufferings and rate changes determines the QoE in the following playback phase. If we take use of these characteristics,the degradation in users’ real-time experience could be improved with a target and the QoE optimization could be simplified.

III. V-QoE DATABASE AND CONTINUOUS RNN-QoE PREDICTION MODEL

In this section, we first introduce the v-QoE database which contains 80 test videos and the subjective evaluation of the videos, and then provide the RNN-QoE model for continuous QoE prediction.

In order to retain viewers’ interests to watch the test videos and grade the QoE score carefully, we have five different kinds of videos including entertainment programs, animations,sports events, love movies, and nature documentaries as shown in figure 1. Each kind consists of four 120-seconds videos covering slow- and fast- moving scenes. Each video is encoded at 5Mb/s (1080P) high definition which are originally taken from open sources on video portals. Given that a realistic viewing process may contain a combination of uncertain startup delay, several times of rebufferings, and fluctuated video rate, we artificially introduce the three distortions to each of the 20 original videos to simulate the situation encountered in real video playing.

3.1 Simulation of different distortions

To simulate the situation when viewers encounter startup delay and rebuffering, we insert certain length of black screens to the corresponding position in a test video. For the startup delay, we deploy 1 sec, 2 secs, and 4 secs black screen at the beginning of a video,and we randomly select two different lengths of startup black screen for each video and get 40 test videos. For the rebuffering, we consider a distortion schema consisting of different rebuffering frequencies and durations. As shown in table 1, the distortion pair (1, 0.5)means that the 120-seconds test video contains 1 rebuffering which lasts for 1 second. For each pair of rebuffering frequency and duration (CRD), the rebuffering events are introduced at random locations of the playing using a uniform distribution. For each of the 40videos, we randomly select 2 CRDs to embed into the video: one is from the CRDS whose rebuffering frequency falls in 0.5 or 1, the other’s rebuffering frequency is higher than 1,and we get 80 test videos with different startup delay and rebuffering distortions now.

Table I. Distortion schema of rebufferings.

Fig. 1. Five different kinds of test video contents.

To simulate the dynamic rate changes, we assume that each virtual chunk of a video has a duration of 4 secs and make the following rate adaptation rules: a downward rate change is deployed right after a rebuffering, and an upward rate change is deployed whenever the rebuffering doesn’t happen for 12 continues seconds. For the downward rate adaptation,we first intercept the corresponding video part from the original 1080P video and down sample this part to 720P, and then the 720P video part is up sampled back to 1080P so that a uniform resolution is maintained throughout the whole video.

3.2 Subjective evaluation on smartphones

Fig. 2. Evaluation Interface in Chinese of our RTS application.

Fig. 3. Structure of the RNN-QoE model.

Since smartphones represents over 80% of total mobile traffic [1], a real-time QoE scoring application (RTS) was developed to collect the subjective evaluation of the above test videos.The evaluation interface of RTS is shown in figure 2, the left part on the screen is the video playing region, and the right part is for QoE scoring which ranges from 1 to 5 representing bad to excellent video experience. The maximum of the sequence number above the video is 80. A subjective evaluation starts by a viewer pressing the play button in the middle of the video, then RTS can record the continuous opinion scores from a real subject for the test video with the help of a moving slider, the initial position of which is set to 3.

All the videos were evaluated by over 5000 volunteers. The volunteers’ age ranged from 20 to 45, who were mostly staffs from China Unicom and students from Tsinghua University. Since the entire video evaluation of 80 videos took about 3 hours, the evaluation was conducted in multiple sessions with sufficient breaks in between to ensure the scoring quality of each volunteer. We have cleaned the raw evaluation data in v-QoE with simple rules to exclude illogical scores. The cleaned data is used in the training of the following RNNQoE model.

3.3 RNN-QoE prediction model

The goal of our continuous RNN-QoE prediction engine is to efficiently process the real-time nonlinear objective QoE related measurements including the quality of the video chunk viewing by the users, and the perceptual annoyance caused by unsmooth rate adaptations, the rebuffering traces, which embeds the hysteresis effects of human visual response.Our QoE prediction model is based on RNN since it has the propagation mechanism of the previous results, expressed in figure 3, and take a triple Xt= [x1tx2tx3t] to contain three highly descriptive QoE-aware external variables at time t:

1) x1t: the square of the objective video rate Rc;

2) x2t: the number of rate changes that have occurred in the past one minute (15 chunks);

3) x3t: duration from the last rebuffering up to now; if there is no rebuffering, x3t is set to 0; for the prediction of the first chunk of a video, x3t equals the startup delay.

The training of RNN-QoE is based on the real-time measured data stored in v-QoE. Corresponding to figure 3, the subjective QoE at time t (t=1, 2, 3, …) is QoEt, the external inputs is Xt, ht(t=0, 1, 2, …) represents the internal state variable which is a transformation of ht-1and the external input Xt, and f denotes the network architecture of the RNN whose parameters can be determined by training. Due to the propagation of internal state variables over time, the QoE prediction at every moment not only embodies the real-time influence of external inputs, also reflects the cumulative effects of previous external inputs.

The RNN model takes forms of basic LSTM (Long Short Term Memory Networks)cell and contains three hidden layer layers accounting for 3 external inputs, each with 10 neurons. Besides, the time step of the RNN model is 15, which assumes that the oneminute-away network state has neglectable 1nfluence on the current QoE score. In order to determine the best f for RNN-QoE in terms of the dynamic time warping (DTW) distance,the training set is divided into two subsets: one for training and one for validation. We used the Adam algorithm [23] for the optimization of RNN-QoE. Since different hyperparameters and training sets may yield different results,we also had cross-validation in the training and found the optimal set of parameters which performs well on both the training subset and the validation subset. The other hyperparameters set later in testing the performance of the proposed strategy is both theoretical and empirical (e.g., the suitable maximum buffer size and the reservoir size is selected for the classic video streaming scenario). The prediction result of the same video content with different distortion schemes based on our RNN-QoE is shown in figure 4, and the dynamic time warping (DTW) distance [24] between the actual QoE and the predicted QoE, which accounts for the temporal structure of each time-series,is 3.3212, meaning that RNN-QoE can follow the subjective QoE trend quite well.

IV. TWO-PHASE RATE ADAPTATION STRATEGY

In this section, we first define the startup phase and the playback phase during video playing,and then analyze key factors of users’ QoE and propose video rate adaptation method for each phase.

Fig. 4. Prediction performance of the RNN-QoE model.

We divide the video viewing process into two phases according to the occupancy of client buffer: the startup phase and the playback phase. The client buffer size is measured by the time length of the downloaded video chunk(s) in seconds. The player has a total buffer size of 240s. A video chunk with Ts =4s can be added to the buffer after it has been fully downloaded; every second, 1s of video content is removed from the buffer and played back to the user. If the buffer is probed when a 4s video chunk is played out of the buffer,the buffer occupancy must be a multiple of 4s. We refer to the portion within the buffer occupancy of 80s as the reservoir, which corresponds to the startup phase of video playing.We enter the playback phase of video playing when the buffer occupancy exceeds 80s. By studying the sensitivity of RNN-QoE to the input parameters, we find that the key factors influencing user’s experience in two phases of video playing are different, e.g. in the startup phase, users are more sensitive to the startup delay, while in the playback phase bad video quality and frequent rebufferings are difficult to accept. Therefore, we use different adaptive methods to select the proper rate of video chunks, thus to locate the cause of degradation in real-time QoE accurately.

4.1 Startup phase

In the startup phase, startup delay has a greater impact on QoE than video rate and rate change. Users desperately needs the video to start playing. Therefore, our QoE-oriented video rate adaption should ensure the startup delay to be as small as possible, and the video rate adaptation is based on network bandwidth estimation. To minimize the startup delay, the minimum video rate R1is chosen for the first three consecutive video chunks once a user presses the play button of a video, which guarantees the shortest startup delay under a given network bandwidth. In fact, the beginning of a video used to be advertisements and some intro information, the video quality of which has little impact on QoE, so it is feasible to select a lower video rate for these contents.

Then, before filling up the reservoir, the video rate adaption has to consider the trade-off between video rate and rebuffering while minimizing the frequency of rate change. The buffer occupancy in this stage is still in a low level,which encodes inadequate information about available capacity and is not suitable as the basis of rate adaptation. The core rate adaptation strategy for the startup phase is based on capacity estimation. When the last video chunk is downloaded and pushed into the buffer, we take its average download rate Rdas an estimation of the network capacity and record the video rate of the chunk as Rc. The video rate of the next chunk Rnextis determined by the relationship between Rdand Rcof the last chunk.

(1) Rd＞ Rc: In this case, the buffer occupancy will increase, so we have the opportunity to choose a higher level of video rate. The video rate is a series of discrete values that constitute a set of video ratesR= {R1, R2, ...,Rn}, and rate adaptation aims at selecting a suitable 6deo rate for the next chunk from this set. Suppose that we fetch the next chunk at a higher video rate Rc+1, then the download time required under the same network condition as the last segment is:

While downloading a new video chunk, the content in the buffer is continuously consumed in units of time. To prevent the buffer from running out, we need to ensure that the time for downloading a new video chunk is shorter than the buffer occupancy. We denote BO(t)as the buffer occupancy before downloading the new video chunk. If Tthr≥ BO(t), we keep Rcas the video rate of the next chunk; if Tthr＜BO(t), we make a rate transfer similar to the finite state machine [25].

In BDRM [20], the minimum video rate R1is used to estimate the network capacity in theworst case, and the video rate is immediately increased when the calculated download time of the next chunk is smaller than the buffer occupancy. In contrast, we adopt a more aggressive predictor Rdto estimate the network capacity, and when the calculated download time of the next chunk is smaller than the buffer size, the increasement in video rate is more prudent. We detect the video rates of the last three downloaded chunks. If they share the same video rate Rc, the video rate of the next chunk will be Rc+1when Tthr＜ BO(t) again;in other cases, we still choose the same video rate Rcas the previous chunk. That is to say,when the video rate is prompted to increase by the rate indicator Rd, the increasement will take place only if the last three chunks share the same video rate. Table 2 provides the rate adaptation comparison between 2-PRA and BDRM. Compared with BDRM, 2-PRA increases the video rate more gently with two main features: 1) since the first three chunks use the minimum rate R1, the conditions Rd＞ Rcand Tthr＜ BO(t) can be easily satisfied at the very beginning, and the video rate can be soon improved; 2) since the video rate is increased to Rc+1only after three consecutive chunks at rate Rc, it can reduce the frequency of rate change and compensate for the aggressive estimation on the network capacity.

Table II. Rate adaptation comparison between 2S-RA and BDRM.

(2) Rd≤ Rc: In this case, we will reduce or maintain the video rate. To avoid rebuffering events, the following formula should be satisfied:

For the maximum video rate Ri that can satisfy formula (2), if Ri＜ Rc, we select R’ =Ri; otherwise, we select R’ = Rc. Different from the aforementioned increasing part, the reduction of the video rate is performed immediately to the next chunk, it is possible to lower the video rate by several levels at a time. As shown in table 2, 2-PRA immediately reduces the video rate from R27to R8at t355. Since the buffer occupancy in the startup phase is not long enough to offset the mismatch between the low download rate and the high video rate,the immediate video rate reduction helps to prevent rebufferings.

The algorithm of 2-PRA for the startup phase is shown in Algorithm 1, which is implemented when the first three chunks at R1are downloaded. Based on the estimation of network capacity, we improve the video rate step by step from the minimum value R1when the network conditions are good and reduce the video rate immediately when the network conditions are poor. The main goal at this phase is to keep startup delay to a minimum value and reduce the probability of rebuffering when filling the reservoir. Compared with BDRM,the variability of the selected video rates is under control due to the introduction of finite state machine, which is favorable to a smooth viewing experience.

When the reservoir is filled up, i.e., the buffer occupancy extends to 80s, the video rate of the last chunk added to the buffer is random,and its distribution is closely related to the network condition during the startup phase. This video rate value provides the priori information about network capacity for the following buffer-based rate adaptation strategy for the playback stage.

Algorithm 1. Rate adaptation for the startup phase.1: if BO(t - 1) ＜ 0 then 2: Rnext (t)= Rmin 3: end if 4: if BO(t - 1) ＞= 0 && BO(t - 1) ＜ 80 then 5: Rd = C(t - 1)6: Rc = Rnext (t - 1)7: if Rd ＞ Rc then 8: R’ = min(Rc+1, Rmax)9: Tthr = (R’ * Ts) / Rd 10: if Tthr ＜ BO(t) && Rnext (t - 3) = = Rnext (t - 2) = = Rc 11: then 12: Rnext (t) = R’13: else 14: Rnext (t) = Rc 15: end if 16: else 17: fine max R’ in R that satisfies (R’ * Ts) / Rd ＜ BO(t - 1)18: if R’ ＜ Rc then 19: Rnext (t) = R’20: else 21: Rnext (t) = Rc 22: end if 23: end if 24: end if

4.2 Playback phase

During the playback phase, the reservoir is filled up, ie, the buffer occupancy is greater than 80s.We suppose that the buffer occupancy is sufficient to resist the fluctuation of network capacity,and a more aggressive strategy can be taken to increase the video rate based on the buffer. We denote the video rate of the last chunk as Rbasewhen the buffer reaches 80s, then Rbase∈Ris an indefinite value which is influenced by network conditions in the startup phase.

Algorithm 2. Rate adaptation for the playback phase.1: if BO(t - 1) ＞ = 80 && BO(t - 1) ＜216 then 2: Rnext (t)= map(BO(t - 1))3: end if 4: if BO(t - 1) ＞= 216 && BO(t - 1) ＜ 236 then 5: Rnext (t)= Rn 6: end if 7: if BO(t - 1) ＞ = 236 then 8: Rnext (t) = 0 9: end if

Fig. 5. The rate map used in 2-PRA.

Our video rate adaptation scheme actually gives the mapping relationship between the selected rate R and the buffer occupancy BO(t). As shown in figure 5, there is no definite relationship between R and BO(t) in the startup phase since the rate adaptation is based on network estimation. In the playback phase where BO(t) ranges from 80s to 240s, R is linear with BO(t) until BO(t) = 216s, after which the maximum video rate Rnis used to fetch the next chunk. It should be noted that when 80s≤BO(t)≤ 216s, the determined line segment with the endpoints (80s, Rbase) and (216s, Rn)builds up a mapping relationship for R and BO(t). Since R is a series of discrete values, R stays at Rcas long as the rate suggested by the rate map does not cross the next higher (Rc+1)or lower (Rc-1) discrete video rate, else R is switched up or down to a new discrete value suggested by the rate map. When the buffer is full, i.e., BO(t) approximates 240s, we will pause the download and wait for three video chunks to be consumed from the buffer, then we continue with new 2-PRA instructions.

We define a new indicator Txto demarcate the addition in BO(t) when video rate could be adapted from Rcto Rc+1:

(1) (Rd× Ts) / Rc＞ (Ts + 0.5Tx), which indicates that during the playback of a video chunk with length Ts from the buffer, the length of the newly downloaded chunk with video rate Rchas exceeded (Ts+0.5Tx), i.e., the buffer occupancy size will increase more than 0.5Tx. According to the R-BO(t) mapping relationship shown in figure 5, we should increase the video rate from Rcto Rc+1.

(2) (Ts - 0.5Tx) ＜ (Rd× Ts) / Rc＜ (Ts +0.5Tx), this condition indicates that the length of the newly downloaded chunk with video rate Rcexceeds (Ts - 0.5Tx) and is less than(Ts + 0.5Tx) when buffer drains a video chunk of Ts, i.e., the change in the buffer occupancy size is ±0.5Tx. In this case, the buffer occupancy size doesn’t change to the adjacent level, so the video rate remains unchanged at Rc.

(3) (Rd× Ts) / Rc＜ (Ts - 0.5Tx), the condition indicates that the length of the newly downloaded video chunk with Rcis less than 0.5 Txduring the playback of the video chunk of Ts, ie, the buffer occupancy size will decrease by more than 0.5 Tx. In this case, the buffer occupancy size goes to the next lower level and we reduce the video rate to Rc-1.

The rate adaptation strategy for the play-back phase selects an appropriate video rate based on the buffer occupancy size. It can avoid frequent rate change to some extent. In fact, as long as 0.5 Rc＜ Rd＜ 1.5 Rc, i.e., the estimated value of the average network download rate Rdfluctuates within a given range,we do not need to adjust the video rate. In practical applications, the estimated average network download rate does not deviate too much from the actual value because the estimated value is calculated from the download process of the last video chunk. The overall framework of 2-PRA strategy is provided in Algorithm 3 Compared with BBA-0, our R-BO(t) mapping relationship varies with the endpoint value Rbase, which is determined in the previous phase. During the startup phase,a proper rate increase based on the network conditions is also adopted instead of fixing the video rate at R1. The proposed 2-PRA strategy is more conducive to matching the video rates and network capacity in playback phase and improving the network efficiency.

V. SIMULATION RESULTS

In this section, our 2-PRA strategy is compared with three rate adaptation algorithms in terms of the average video rate, the number of rate changes, the number of rebufferings, and the average QoE:

(1) Baseline, which increases the video rate to a higher level if there is no rebuffering in 12 secs (3 chunks) and adopts the same rate adaptation as the startup phase to lower the video rate to a proper level immediately after a rebuffering event.

(2) BBA-0, which is a state-of-the-art buffer based scheme, it remains a minimum video rate in the startup phase and then enters a fast rate growing period according to a rate map,video rate is decreased to the minimum value each time it needs a rate degradation.

(3) BDRM, which estimates the worst-case network capacity in a conservative way but increases the video rate immediately according to the rate indicator, and the rate adaptation method remains unchanged throughout the video playing.

Algorithm 3. Framework of two-phase rate adaptation strategy.Input: C: simulated network download rates; R: video rate set;K: total schedule times; map: mapping relationship of playback phase; f: the proposed RNN-based QoE prediction model.Output: Rnext (t): video rate of the next chunk; BO(t): buffer size;QoE: real-time QoE scores.1: initialize BO(t)2: initialize t = 1 3: if t ＜ = K 4: if BO(t) ＜ = 80 then 5: do Algorithm 1 6: else 7: do Algorithm 2 8: end if 9: update BO(t):10: if BO(t - 1) ＜ 236 then 11: BO(t) = BO(t - 1) + Ts - Rnext (t) * Ts / C(t)12: else 13: BO(t) = BO(t - 1) - Ts 14: end if 15: count the number of rebufferings rb_t and rate changes rw_t 16: generate tuples T = { Rnext (t), rb_t, rw_t }17: calculate real-time QoE(t) = f(T)18: t++19: end if

5.1 Setup

As shown in figure 6, the fluctuated network bandwidth during downloading 1000 video chunks is simulated by 1000 random download rates ranging from 2 Mb/s to 28 Mb/s.The video rate setR= {1, 1.5, 2, 2,5, …, 13,13,5} in Mb/s, |R|=25. A rebuffering event occurs when the buffer occupancy BO(t) ≤ 0.In this case, the next chunk will be downloaded at 1 Mb/s until BO(t) ≥ Ts. For each rate adaptation method, we will get 1000 video rates corresponding to 1000 chunks and the dynamic change of BO(t). The results generate a series of triples consisting of video rate R,rebuffering times S, and rate change times C,i.e. (R, B, C), and QoE can be predicted in real time by importing these triples into the established RNN-QoE model.

Fig. 6. Highly variable network download rate experienced by clients.

Fig. 7. Dynamic changes of video rate with 1000 chunks.

Fig. 8. Variation in buffer occupancy with 1000 chunks.

Table III. Comparison in R, B, C, and QoE values with 1000 chunks.

5.2 Results and analyses

5.2.1 Video rate

All four video rate adaptation algorithms are operated under the simulated network condition in figure 6. The dynamic rate change of different video rate adaptation methods is shown in figure 7. It can be seen that the video rate of BDRM fluctuates between high values and low values violently, the video rate of BBA-0 rises moderately but decreases to the lowest value each time a degradation occurs, and the video rate of Baseline presents a conventional pattern, i.e., rises in a straight line with suddenly drops. In contrast, the video rate of 2-PRA can smoothly increase to a considerable quality level while keep less rate degradation.

5.2.2 Buffer occupancy

Fig. 8 shows the variation in the buffer occupancy of different video rate adaptation schemes. We can see that the buffer occupancy of 2-PRA fluctuates slowly between the startup phase and the playback phase, and the rebuffering events (i.e., when buffer occupancy reaches 0) seldom happen even under bad network conditions (schedule time 300 to 530). While the buffer occupancy of BDRM remains a lower level and fluctuates wildly with more rebuffering events, the buffer occupancy of BBA-0 fluctuates at a high level and no rebuffering occurs. The range of Baseline’s buffer occupancy is similar to BDRM with a gentler fluctuating form but rebuffering events occur most often.

5.2.3 QoE score

With the simulation results calculated up to now, as shown in the first three lines in table 3, we can make up a series of (R, B, C) triples.We can see that 2-PRA has the highest average bit rate, the lowest number of rate changes,and the number of rebufferings remains a very low level. In order to quantify the integrated performance of the four rate adaptation algorithms, we import these (R, B, C) into the established RNN-QoE model to predict user's real-time QoE, which can act as an essential performance indicator for rate adaptation methods.

Fig. 9 presents the comparison in QoE scores, where 2-PRA keeps a moderate QoE level in most cases. The average QoE of 2-PRA is 4.1854, which outperforms all the other algorithms. We believe that an important feature is missing in the existing rate adaptation methods that the key QoE-aware factors do not stay the same during video playing,which will cause overdue or untargeted adaptation when encountering degradation in users’ real-time QoE. 2-PRA strategy achieves a trade-off among video rate improvement and reduction in the number of rate change and rebuffering at different phases. It not only avoids the conservativeness of video rate promotion in BBA-0, but also avoids the negligence about buffer occupancy in BDRM, and hence it can perform well in a QoE-oriented rate adaptation environment.

VI. CONCLUSION

In this paper, we have proposed a two-phase rate adaptation (2-PRA) strategy, which fully utilizes the variation in the influence factors to users’ experience and implements a targeted rate adaptation for improving real-time video QoE. First, to measure and assess the real-time QoE, we provided a continuous RNN-based QoE prediction engine accounting for cognitive memory and recency of the human visual response. For rate adaptation, the video playing process was separated into the initial startup phase and the steady playing phase,and we conservatively shortened the startup delay for the former phase while focused on balancing video rate and rebufferings for the latter aggressively. Simulation results have shown that 2-PRA can improve the three key factors affecting users’ QoE and effectively ameliorate the average QoE score of a video playing service.

ACKNOWLEDGMENT

This work was supported by the National Nature Science Foundation of China (NSFC 60622110, 61471220, 91538107, 91638205),National Basic Research Project of China(973, 2013CB329006), GY22016058.