Priority probability deceleration deadline-aware TCP

2015-04-11JinYeJingLinandJiaweiHuang

Journal of Systems Engineering and Electronics 2015年3期

关键词：环模模孔所需

Jin Ye,Jing Lin,and Jiawei Huang

1.School of Computer and Electronic Information,Guangxi University,Nanning 530004,China;

2.School of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,China;

3.School of Information Science and Engineering,Central South University,Changsha 410083,China;

4.Guangxi Experiment Center of Information,Guilin 541004,China

Priority probability deceleration deadline-aware TCP

Jin Ye1,4,Jing Lin2,and Jiawei Huang3,*

1.School of Computer and Electronic Information,Guangxi University,Nanning 530004,China;

2.School of Information and Communication,Guilin University of Electronic Technology,Guilin 541004,China;

3.School of Information Science and Engineering,Central South University,Changsha 410083,China;

4.Guangxi Experiment Center of Information,Guilin 541004,China

In modern data centers,because of the deadlineagnostic congestion control in transmission control protocol(TCP), many deadline-sensitiveows can notnish before their deadlines. Therefore,providing a higher deadline meeting ratio becomes a critical challenge in the typical online data intensive(OLDI)applications of data center networks(DCNs).However,a problem named as priority synchronization is found in this paper,which decreases the deadline meeting ratio badly.To solve this problem,we propose a priority probability deceleration(P2D)deadline-aware TCP.By using the novel probabilistic deceleration,P2D prevents the priority synchronization problem.Simulation results show that P2D increases the deadline meeting ratio by 20%compared with D2TCP.

data center networks(DCNs);deadline;priority synchronization;transport control protocol(TCP).

1.Introduction

With the rapid developmentof online applications,such as web services,cloud computing and data storage,data centers have become critical platforms for modern online informationservices[1].To ensure high performanceof data center networks(DCNs),recently researchers have proposed novel network structures and architectures[2–4], scheduling algorithms[5,6]and the software dene network(SDN)concept[7]to reduce transport overhead and improve the quality of services.

In this paper,we focus on the online data intensive (OLDI)services,which are the typical data center applications,supporting web search,and online retail and advertising[8].To achieve high performance and service reliability,the OLDI applications usually adopt the barriersynchronized partition-aggregate communication pattern, in which the client initiates requests to all servers in.E parallel.When receiving requests,the servers response nearly at the same time.The responses are then collected by aggregator nodes and sent back to the client.

Recent work reveals that transmission control protocol (TCP)trafc takes up almost 99.91%of the trafc in data centers.Specically,the TCP trafc is a mix of short and longows,including query trafc,delay sensitive short messages and throughput sensitive longows.For most of OLDI applications,the delay sensitiveows are allocated with communication deadlines.The time-sensitive short messageows are large in quantity while the backgroundows make more contributions to the byte[9].Under this kind of trafc pattern,the conventional TCP is unable to satisfytheperformancerequirementsforthefollowingreasons.(i)The default queue scheduling mechanisms,such asrst inrst out(FIFO),increase the tail-end network latencyforshortows.(ii)Thequerytrafcsynchronizesthe responses which may easily overow the queue of switch port connected to the aggregatorand brings about the TCP incastproblem[10].(iii)TheconventionalTCPignorestheow deadlines,while the application requests need be responded in time.Thus,DCNs should provide low latency for delay sensitiveows,high throughput for longows and high tolerance for burst concurrencyows.To achieve such goals,numbers of enhanced transport schemes of DCNs have been proposed in recent years.

To provide the tolerance for burst concurrencyows, DCTCP utilizes the explicit congestion notication(ECN) feedback mechanism to adjust the sending rate.By keepingasmallqueuelengthofswitchbuffer,DCTCP helpsthe latency-sensitiveows avoid large queuing delay and consequently achieve small latency.However,as a deadlineunaware protocol,DCTCP ignores the different deadline requirementsofshort messageows andbackgroundows [11].ICTCP uses adaptive congestion control at the receiving end to settle the TCP incast problem,but also doesnot consider the deadline requirements[12,13].

Compared with DCTCP,D3[14]is therst deadlineaware transport control protocol.In order to meet theow deadlines,D3designs a centralized and proactive method to schedule the network resources.In this way,D3significantly enhances the meeting deadline ratio.However,D3allocates the rate toows by their arrived order,meaning that the scheduling discipline of D3is FIFO.As a consequence,some urgentows arriving later may miss their deadlines.

Based on DCTCP,D2TCP elegantly adjusts the extent of window decreasing according to theow urgent degree and networkcongestionstate.In this way,it helps the shortows with urgent deadlines to meet their deadlines[15].

On the other hand,the deadline-aware scheduling algorithms on the switches assign high priority to short messageows and try to reduce their transport latencies[14,16,17].As a typical scheduling algorithm,PDQ optimally minimizes the averageow completion time (AFCT)byusingearliest deadlinerst(EDF)fordeadlinesensitiveows and shortest jobrst(SJF)for shortows. L2DCT[18]approximates the least attained service(LAS) scheduling principle at the sender in a distributed way.According to the data size aow has sent,L2DCT distinguishes the long and shortows.Then L2DCT adjusts the sending rates of shortows and longows.Therefore,the value of AFCT is reduced.

Although the above deadline-aware schemes reduce the deadline missing ratio,they all ignore the problem that, when the bottleneck link is congested,theows with similar priority may miss the deadline for competing.We call it the priority synchronization problem.Our core design ideal is to ensure moreows meeting their deadlines by our priority probability deceleration.

The remainder of the paper is organizedas follows.The priority synchronization problem is analyzed in detail in Section 2.The protocoldesign of P2D is given in Section 3 and its performanceis evaluated on NS2 in Section 4.Section 5 draws the conclusion and outlines the future work.

2.Priority synchronization

Since the OLDI applications of data centers are delay sensitive,aow can make contributions only if itnishes within its deadline.If someows are not able to execute timely,the responses transferred by suchows may be unavailable or useless,wasting both computing and network resource.Under the partition-aggregate pattern,all the responses of an OLDI application have the sameow deadline and size.This means that theows are likely to respond nearly at the same time.

Here,we take a typical web search application as an example.As shown in Fig.1,a queryow with the search keywordsfrom the client is handedoverto the work nodes, which will respond at the same time.Then the responses are collected by the aggregator to produce a result.Althoughtheremayexisthugenumbersofworknodes,thenite OLDI applications categories limit the numberof trafcclasses.Hence,largenumbersofowswithsimilarow size and deadline coexist.To evaluate the performance of existing datacenter TCP in this scenario,we conduct the following simulation.

Fig.1 Web search application

In the evaluation,11 work nodes are connected to a switch with 4 MB buffer via ten 1 Gbps links with 75µs link latency.In the test,10 work nodes transmit DCTCP and D2TCPows to the 11th host,respectively.We setveows sized 50 kB and fourows sized 1 MB,with respective deadlines of 30 ms and 40 ms,and the remaining one sized 10 MB is set as the background without a deadline. Allows burst synchronously at 0.2 s.

From the observation,we believe that it is a global synchronizationproblemin congestioncontrolwhenows are assigned with similar priority.When congestion occurs, congestion window reduction of the responseows in the same application is similar,and congestion window increment is also identical when there is no congestion.As a result,the completion time of allows is almost the same.That means that multipleows with similar priority may miss their deadlines,which results in bad performances of DCTCP and D2TCP.We call this problem as priority synchronization.

Fig.2 DCTCP and D2TCP performance

Actually,whenthedemandforresourceexceedsthenetwork capacity,the deadline requirements for all theows cannot be satised,especially for thoseows assigned with higher priority.Under this consideration,we propose a probability deceleration method based onow priority, which can weaken the impact of the priority synchronizationproblem,andhelps moreows to meettheir deadlines.

3.P2D protocol

In this section,we give the protocol design of P2D.As described in Section 2,the design challenge is to solve priority synchronization when congestion happens.Thus, we focus on the strategy of resource allocation and adjustment of theow sending rates when priority synchronization happens.Our design goal includes the following.

(i)Achieve more bandwidths for delay sensitiveows when mixed with longows without deadlines;

(ii)Meet the deadlines of most concurrentows with the same priority.

The basic idea of P2D protocol and its potential benets can be presented by an example.Consider the scenario wheretherearefourowsA,B,C andD throughthesame bottleneck link.For simplicity,we assume that one unit size needs one unit time to transmit.As shown in Table 1, theow sizes of fourows are 1,1,2 and 2,respectively. The corresponding relative deadlines are 2.5,2.5,5.5 and 5.5,respectively.

Table 1 Concurrent fows

Fig.3(a)shows the result of fair sharing.Theows fAandfBfromApplication1nish their transmissionat time 4,while fCand fDfrom Application 2 at time 6.We call thenish time as Ts.Obviously,all fourows miss their deadlines.We dene the exceeding time Etas Ts−D.fAand fBexceed 1.5 in time while fCand fDexceed 0.5. Thus we deduce that the bandwidth demand of fAand fBis greater than that of fCand fD.

When it comes to the deadline-aware protocol,such as D2TCP,the result is shown in Fig.3(b).Although the urgentows fAand fBachieve more bandwidths than fCand fD,fAand fBstill cannotnish before their deadlines.Of course we can assign a higher priority to fAand fBto help them meet their deadlines.For example,we could use the EDF discipline to allocate all bandwidths to fAand fB.However,from beginning to end,the aggressive manner of synchronizedows fAand fBresult in the tardy of fCand fD.In fact,the demand of fCand fDshould not be ignored.

However,in Fig.3(d),while the deceleratedow comes to fB,there is only one tardyow.In fact,only choosing theows from Application 1 can satisfy requirements of threeows.This gives us the motivation of P2D,that is to decelerate thoseows whose bandwidth demands are much greater than allocation,just like fAor fBin this scene.

Fig.3 Comparison of resource allocation

Here,we explainhow to select the deceleratedow.The required window size W′iof the ithow is

where Bi,Di,RTTiand MSS are the remaining size,ow deadline,round trip time and maximum segment size, respectively.Then,we dene urgent degree prias the required windowover the average window size.We denote the maximum window size as W to achieve the full bandwidth.Then,theaveragewindowsizeis 3W/4.Therefore,prican be approximated as

Since we can adjust the sending rate by modifying the value of pri,we analyze how the urgent degree prican affect the resource allocation.Let S(W1,W2)denote the numberof packets whichhave been sent out while the congestion windowsize increases from W1to W2in W2−W1RTT.We have

We assume that W∗is the critical window size whenthe switch queue size reaches K,a certain threshold to predict congestion.It takes 1 RTT for the sender to respond to congestion,during which the congestion window size will reach W∗+1.Like DCTCP,we maintain α∈[0,1]to measure the extent of congestion.We further assume that the window sizes of allows are synchronized,and their congestion window varying is shown in Fig.4.

Fig.4 Congestion window varying of P2D

In P2D,the ECN-based feedback mechanism and priority employed in DCTCP and D2TCP are used with a few modications.The congestion level α can be calculated by

With(4)and(3),we get:

Assuming α is small,(5)can be simplied as

Now we can compute the oscillation amplitude Aiin Fig.4.

When reaching a steady state,

where C∈[0,1]is constant,and Wsumis the capacity of the bottleneck.On this basis,we can get the predicted value of windowfor all theows when it reaches the steady state.

The detail protocol design of P2D is given as follows.

3.1P2D sender

The P2D sender maintains the state variables,and flagi.If flagiis set to 1,the ithow will be decelerated by modifying prito an innitesimal value.Moreover,the sender needs to pre-process the trafc according to those state variables whenever any of the following cases happens.

Case 1Whenand the required resource of the ithow is larger than the capacity of the bottleneck, P2D will choose to decelerate it.

When sending packets,the sender attaches a packet header,containing the newest state variables above.Whennishing sending,the P2D sender will send an ICMP notice to switch to announce that the transmission of the ithow is completed.

3.2P2D switch

TheobjectiveofaP2Dswitch is toadjustthe priorityoftheow when priority synchronization happens.The switch maintains state variables from received packet headers ofow i and calculates W∗according to formula(8).The P2D switch will update those variables when receiving a newow.

The probability piof the ithow can be expressed as

Fig.5 Deceleration probability of concurrent fows

The expected value can be calculated as

From(16),we know that the bandwidth resources saved by deceleratingows can just make the remainingows to complete their transmission within the deadline.

3.3P2D receiver

The P2D receiver copies the header fromthe data packet to its corresponding ACK packet.Whenever an ACK packet arrives,the sender updates flagifor the ithow.The whole process can be described as Process 1 which includes Algorithms 1-3.

Algorithm 1Sender

图6中可以看出，当泊松比和长径比较小时，相对压强保持相对平稳，仅在较小范围内增大；当泊松比和长径比增大到一定程度后，随着泊松比和长径比的增大，相对压强急剧增大。表明模孔结构能显著影响挤压制粒成型所需的挤出力，进而影响制粒能耗和设备的使用寿命。随着模孔长径比的增加所需的挤出压强呈指数形式增长，长径比过大将显著提高制粒能耗，反之长径比过小又可能导致制粒条件不满足，颗粒无法成型。因此，根据物料特性选择合适的环模结构尺寸是确保满足制粒成型条件的同时有效降低制粒能耗的重要途径。

InitializationCalculateand flagi

else if flagi=1,pri=infinitesimal;

Attach the header to the packet which containselds W′

i,priand flagi;

ifno congestioncwnd=cwnd+1;

Algorithm 2Switch

Receive a packet

Update flagi,W′i,pri;

N=N+1;

Deceleration function(N);

}

copy flagifrom the switch to the header of packet;

}

If receiving ICMP notice from i then

remove flagi,W∗i,W′i,priin the switch;

N=N−1;

Deceleration function(N)

}

Algorithm 3Deceleration function(X)

calculate pi;

set flagi=1 with pi;

}

For i=0;i＜X;i++

Update and maintain flagi,W∗i,W′i,pri

4.Simulation evaluation

In this section,we evaluate the performance of P2D by using the NS2 simulator.Using the trafc patterns in[3], theow trafc includes query trafc(2 kB to 20 kB in size),delay sensitive short message(50 kB to 1 MB),and throughputsensitivelongows(1MBto50MB).Thesimulations parameters are described in Table 2.priis initialized as 0.1 for backgroundows and deceleratingows.

Table 2 Simulation parameters

4.1Priority synchronization scenario

We set up the simulation environment as in Section 2,and all the simulation parameters are the same.Fig.6 shows the performanceof P2D.We cannd that there is only oneow which is obviously decelerated and the otherows meet their deadlines.

Fig.6 P2D performance

4.2Mixing traffc

To evaluate the friendly performance of P2D,we mix the trafc with the deadline and non-deadline trafc inthe data-intensive environment with network topology in Fig.7.

Fig.7 Network topology of mixing traffc

The work nodes are set to 20,30,and 40,respectively. The root seeds the queryow cyclical to all the work nodes.Thenafteraxedcomputationtime,theworknodes respond to the query synchronously.The size of replies ranges from 100 kB to 500 kB with a deadline of 5–25ms.Inaddition,we havetwo worknodesthatsendlonglivedows of 10 MB to the root,once every 80 ms.The experiment lasts for the duration of about sending 1 000 times queryows.For D2TCP,the value of d is capped within the range(0.5,2).

In Fig.8,the percentage of missed deadlines of TCP is the highest in all the cases,so does the deadline-unaware DCTCP.By contract,D2TCP andP2D performwell.Fig.9 gives the throughputs of the backgroundow.TCPows obtain the least throughput.Compared with DCTCP and D2TCP,the throughput of P2D is 10%and 7%lower.This is because the scheme ensures the higher priorityows’transmission by sacricing a small mount of throughput.

Fig.8 Missed deadlines

Fig.9 Throughput of long fows

4.3OLDI performance

The DCNs typically consist of two tiers of switches in the multi-hop topology.As shown in Fig.10,there are 25 racks and each rack can have up to 40 work nodes.Each worknodeis connectedto the top-of-rack(ToR)switch via 1 Gbps links,and each ToR is connected to a fabric switch which has large buffers via a single link with a line rate equal to 1*number-of-work-nodes-in-a-rack Gbps with 20µs link latency.There are a set ofve synthetic OLDI applications,and each application is allocated withve OLDI trees.Each tree has one parent and N child nodes. N in the OLDI tree varies from 10 to 40.We setve OLDI application’sow data sizes to 100 kB,200 kB,400 kB, 600 kB,and 800 kB with deadlines to 5 ms,10 ms,15 ms, 20 ms,and 25 ms,respectively.The network utilization is 10%–20%and the deadlines have 10%and 50%uniformrandom variance which is close to the realistic scene.

Fig.10 Simulated network of OLDI

Fig.11 shows the missed deadline fraction for four kinds of protocols.Wend that P2D performs better than other three.Compared with high variance(50%),the priority synchronization problem is much outstanding with low variance(10%),when P2D’s improvement for missed deadline fraction is much obvious.The missed deadline fraction of P2D is 25%lower than D2TCP at fan-in degree of40with highvariance.Whenandonlywhenthe rangeof d falls into 0.5–2,D2TCP can prioritizes theows.However,in this experiment,the urgent degree d reaches its upper limit.Thus,D2TCP has little difference in performances compared with DCTCP.

Fig.11 OLDI performance

5.Conclusions

We design P2D,a deadline-aware protocol,which targets reducing the missed deadline fraction when the priority synchronization problem happens in DCNs.P2D is able to estimate the maximum delay sensitiveows that bottleneck capacity can satisfy.And it takes into account the strategy of resource allocation and adjusts the sending rate of work nodes with higher priority.By using the simulation,we show this method can ensure moreows to meet their deadlines.Future work includes the improvement of the rate control protocol for maximizing the number ofows that meet their deadlines.

[1]Y.Zhang,N.Ansari,Y.Zhang,et al.On Architecture design, congestion notication,TCP incast and power consumption in data centers.IEEE Communications Surveys&Tutorials, 2013,15(1):1–26.

[2]Y.J.Wang,W.D.Sun,S.Zhou,et al.Key technologies of distributed storage for cloud computing.Journal of Software, 2012,23(4):962–986.

[3]X.Q.Liu,S.B.Yang,L.M.Guo,et al.Snowake:a new-type network structure of data center.Chinese Journal of Computers,2011,34(1):76–86.(in Chinese)

[4]X.L.Wei,M.Chen,J.H.Fan,et al.Architecture of the data center network.Journal of Software,2013,24(2):295–316. (in Chinese)

[5]Y.Y.Li,H.B.Wang,P.Zhang,et al.Multi-attribute aware scheduling for inter-datacenter bulktransfers.Journal onCommunications,2012,33(9):121–131.

[6]C.L.Cheng,Y.Pan,D.Y.Zhang.Energy saving resource scheduling algorithmincloudenvironment.Journal ofSystems Engineering and Electronics,2013,35(11):2416–2423.

[7]D.Crisan,R.Birke,G.Cressier,et al.Got loss?get zOVN! Proc.of the ACM SIGCOMM,2013:423–434.

[8]P.Zheng,L.Z.Cui,H.Y.Wang,et al.A data placement strategy for data-intensive applications in cloud.Chinese Journal of Computers,2010,33(8):1472–1480.(in Chinese)

[9]M.Alizadeh,A.Greenberg,D.A.Maltz,et al.Data center TCP.Proc.of the ACM SIGCOMM.2010:63–74.

[10]Y.M.Ren,Y.Zhao,P.Liu,et al.A survey on TCP Incast in data center networks.International Journal of Communication Systems,2012,27(8):1160–1172.

[11]M.Alizadeh,A.Javanmard,B.Prabhakar.Analysis of DCTCP:stability,convergence,and fairness.Proc.of the ACM SIGMETRICS,2011:73–84.

[12]H.T.Wu,Z.Q.Feng,C.X.Guo,et al.ICTCP:incast congestion control for TCP in data center networks.Proc.of the 6th ACM CoNEXT,2010.

[13]V.Vasudevan,A.Phanishayee,H.Shah,et al.Safe and effectivene-grained TCP retransmissions for datacenter communication.SIGCOMM Computer Communication Review,2009, 39(4):303–314.

[14]C.Wilson,H.Ballani,T.Karagiannis,et al.Better never than late:meeting deadlines in datacenter networks.Proc.of the ACM SIGCOMM,2011:50–61.

[15]B.Vamanan,J.Hasan,T.N.Vijaykumar.Deadline-aware datacenter TCP.Proc.of the ACM SIGCOMM,2012:115–126.

[16]C.Y.Hong,M.Caesar,P.Godfrey.Finishingows quickly with preemptive scheduling.Proc.of the ACM SIGCOMM, 2012:127–138.

[17]M.Alizadeh,S.Yang,M.Sharif,et al.pFabric:minimal nearoptimal datacenter transport.Proc.of the ACM SIGCOMM, 2013:435–446.

[18]A.Munir,I.A.Qazi,Z.A.Uzmi,et al.Minimizingow completion times in data centers.Proc.of the IEEE INFOCOM, 2013.

Biographies

Jin Ye received her Ph.D.degree from the School of Information Science and Engineering at Central South University in2008 andher B.S.and M.S.degrees in communication engineering from Guilin University of Electronic Technology.She is now a professor in Guangxi University,as well as a member of the Key Laboratory of Cognitive Radio and Information Processing.Her research interests include performance analysis,congestion control,and cross-layer design in wireless networks.

E-mail:Yejin@gxu.edu.cn

Jing Lin received her B.S.degree in communication engineering from Nanchang University in 2009.She is now a M.S.candidate in electrical and computer engineering in Guilin University of Electronic Technology.Herresearch interests include the mathematical analysis and protocol design of emerging wireless networks.

E-mail:linjin@guet.edu.cn

Jiawei Huang obtained his Ph.D.and M.S.degrees from the School of Information Science and Engineering at Central South University in 2008 and 2004,respectively.Healso received hisBachelor degree from the School of Computer Science at Hunan University in 1999.He is now an associate professor in the School of Information Science and Engineering at Central South University,China.Hisresearch interests include performance modeling,analysis,and optimization for wireless networks and data center networks.

E-mail:jiaweihuang@csu.edu.cn

10.1109/JSEE.2015.00067

Manuscript received January 07,2014.

*Corresponding author.

This work was supported by the National Natural Science Foundation of China(61163060;61103204;61462007).