APP下载

Short-term traffic flow online forecasting based on kernel adaptive filter

2018-12-20LIJunWANGQiuli

LI Jun, WANG Qiu-li

(School of Automation & Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China)

Abstract: Considering that the prediction accuracy of the traditional traffic flow forecasting model is low, based on kernel adaptive filter (KAF) algorithm, kernel least mean square (KLMS) algorithm and fixed-budget kernel recursive least-square (FB-KRLS) algorithm are presented for online adaptive prediction. The computational complexity of the KLMS algorithm is low and does not require additional solution paradigm constraints, but its regularization process can solve the problem of regularization performance degradation in high-dimensional data processing. To reduce the computational complexity, the sparse criterion is introduced into the KLMS algorithm. To further improve forecasting accuracy, FB-KRLS algorithm is proposed. It is an online learning method with fixed memory budget, and it is capable of recursively learning a nonlinear mapping and changing over time. In contrast to a previous approximate linear dependence (ALD) based technique, the purpose of the presented algorithm is not to prune the oldest data point in every time instant but it aims to prune the least significant data point, thus suppressing the growth of kernel matrix. In order to verify the validity of the proposed methods, they are applied to one-step and multi-step predictions of traffic flow in Beijing. Under the same conditions, they are compared with online adaptive ALD-KRLS method and other kernel learning methods. Experimental results show that the proposed KAF algorithms can improve the prediction accuracy, and its online learning ability meets the actual requirements of traffic flow and contributes to real-time online forecasting of traffic flow.

Key words: traffic flow forecasting; kernel adaptive filtering (KAF); kernel least mean square (KLMS); kernel recursive least square (KRLS); online forecasting

0 Introduction

As an important research area in intelligent transportation systems (ITS) applications, traffic flow forecasting can provide real-time traffic information for travelers and formulate corresponding traffic management strategies for transportation system management (TSM), traffic control systems and traffic information service systems to improve the operation efficiency of urban traffic network[1]. Since urban traffic flow systems are highly nonlinear, complex and time-varying, according to different principles, the prediction methods can be divided into two classes: one is based on mathematical models, and the other is based on intelligent models[2-3]. In this paper, the latter is discussed.

The methods based on neural network and support vector machine (SVM)[4-8]have been successfully applied to modeling and forecasting of short-term traffic flow in recent years. Messer, et al. applied particle swarm optimized neural network with adaptive weights (PSOA-NN) to short-term traffic flow forecasting[5]. Zhang, et al. applied SVM to one-step and two-step forecasting of traffic volumes on urban freeways[6-7]. Chen, et al. combined least square support vector machine (LSSVM) with genetic algorithm (GA) and singular spectrum analysis respectively for the one-step forecasting of short-term traffic flow and obtained good predictive results[8-9]. Huang, et al. put forward extreme learning machine (ELM) in 2006, which could get better network generalization performance with extremely fast learning speed[10]. The regularized ELM method was successfully applied to traffic flow one-step forecasting[11]. On the basis of the ELM algorithm, when hidden layer feature mapping function is unknown, replaced by the kernel function to develop a kernel extreme learning machine (KELM) method, its advantage is that the number of the hidden layer nodes needs not be given. Li, et al. applied it to the time series modeling[12].

Taking feature extraction into consideration, Rosipal, et al. applied kernel partial least squares (KPLS) to time series modeling[13]. Sun, et al. used principal component analysis (PCA)-SVM method for short-term traffic flow forecasting[14]. Chen, et al. combined kernel KPCA with SVM for chaotic time series forecasting, improving the prediction accuracy to a certain degree.

Above mentioned methods can well predict the traffic flow, but they are offline predictions and cannot meet the real-time requirement of road traffic flow prediction.

Kernel adaptive filter (KAF)[16]has the ability to adaptively update filter parameters based on online samples and is suitable for online time series forecasting. Engel, et al. proposed kernel recursive least square (KRLS) method, which limited the size of the kernel matrix based on sparse technique to suit the training of larger datasets[17]. Liu, et al. applied kernel least mean square (KLMS) method to time series prediction[18-19]. Vaerenbergh, et al. put forward a fixed budget (FB)-KRLS algorithm[20], which pruned the least significant data point to suppress the growth of kernel matrix. Different adaptive filter algorithms have been successfully used to the prediction and identification of nonlinear systems and have achieved good results.

In the applications of kernel learning methods, SVM and KELM have been used for traffic flow and time series forecasting. In this paper, for short-term traffic flow forecasting, KLMS and FB-KRLS algorithms based on kernel adaptive filter are proposed.

Different from general time series[18-20], traffic flow is time-varying and KLMS and FB-KRLS algorithms can meet the needs of online traffic flow forecasting because its adaptive characteristics can continuously update the predictive models with the arrival of new samples. Different from the existing algorithms[18-20], during the online forecasting of traffic flow, these two algorithms predict the first data with the trained adaptive prediction model, and then update the forecasting model with the obtained data so as to predict the next data until all the data points are predicted.

1 KAF

Inspired by the online adaptive learning algorithm, KAF is an online sequence estimation algorithm based on kernel learning. When thei-th pattern (xi,yi) is obtained, the first (i-1) patterns have been estimated (denoted asfi-1) by updating prediction model, and the current nonlinear mapping relationshipfican be got. Given training data {xi,yi}∈Rm×R1(i=1,2,…,N), we can construct input matrix and output matrix asX∈RN×mandY∈RN×1, respectively. To apply the kernel trick, we map the data into high-dimensional feature spaceFasφ∶x∈Rm→φ(x)∈F⊆RM. To avoid the difficulty of computation in the feature space, we define the kernel functionk(xi,xj)=φ(xi)Tφ(xj), which satisfies the Mercer’s theorem. A commonly used kernel function is the Gaussian kernel, namely

(1)

whereσis the kernel parameter.

1.1 KLMS

Least mean square (LMS) regression model can be expressed as

(2)

(3)

The principle of the LMS method is to minimize the cost function, that is

(4)

whereJ(i) is cost function.

According to the gradient descent method, from Eq.(4) it can be obtained as

(5)

whereηis the learning rate.

The LMS algorithm can be readily extended to high-dimension feature space by the kernel trick, and KLMS algorithm can also be readily used in high-dimension space to derive nonlinear, stable algorithms whose performance maches up with batch, regularized solutions. Mappingxito high-dimension feature space by the Mercer’s theorem, we can obtain

(6)

Ω(i)=Ω(i-1)+ηe(i)φ(xi).

(7)

According to Eqs.(5) and (6), Eq.(7) can be expressed as

(8)

AfterNsteps training, Eq.(7) can be expressed as

(9)

The steps for realizing short-term traffic flow prediction based on KLMS are as follows:

Step 1: Give sequentially input-output patterns {xi∈U,yi),i=1,2,…

Step 2: Online training. InitializeC1=[x1],α1=[ηy1],η>0.

Step 3: Leti=i+1 and the output of adaptive filter can be calculated as

(10)

Step 4: Obtain error

(11)

Step 5: Store new center and update coefficient vector

Zi=[Zi-1,xi],αi=[αi-1,ηei].

(12)

Step 6: Iteratively calculate steps 3-5 until all the training data complete the learning.

Step 7:Online adaptive testing. Predictxjusing adaptive prediction model based on the trained model: predicte the first data with the trained KLMS model, update the forecasting model with this value and predict the next data with the updated model until all the data are predicted.

1.1.1 Sparsification criterion

(13)

whereυ∈[0,1] is the threshold parameter.

According to Eq.(12), Eq.(13) can be further expressed as

(14)

Defining the Gram matrixK=ΦTΦ,K-1=Q, its element isk(xi,xj). Expanding Eq.(14), we can obtain

(15)

whereetis priori error.

[Kwt-1]ij=k(xwt-1,·)k(xwt-1,·)=k(xwt-1,xwj-1),

kt=k(xt,·)k(xwt-1,·)=k(xwt-1,xt),

kt=k(xwt-1,·).

While new samplextarrives, it is judged whetherk(xt,·) satisfies the sparse criterion of Eq.(13). If it satisfies the conditons, it is added in the dictionaryDt+1={Dt,k(xt,·)}, and updates are given by

(16)

(17)

where column vector0wt-1has dimension size of (wt-1)×1.

Therefore, the weights and predicted output of KLMS regression models with multi-dimension input and single dimension output on the training data set are obtained as

(18)

whereKt=[k(x1,·),…,k(xwt,·)]; andα=[α1,…,αmt]Tis the filter weight vector.

1.2 FB-KRLS

FB-KRLS algorithm is capable of recursively learning a nonlinear mapping and tracking changes over time. In an offline scenario, wherelinput-output patterns are sequentially acquired, KRLS looks for and minimizes the optimal coefficient of Eq.(19) as

(19)

wherey∈Rl×1is the vector that contains the training outputyi;K∈Rl×lis kernel matrix, whose element isKi,j=k(xi,xj),λis the regularized coefficient. The solution of Eq.(19) is

α=(K+λI)-1q,

(20)

whereI∈Rl×lrepresents the unit matrix.

The goal of kernel recursive least-squares is to update the solution of Eq.(20) recursively as new data become available. KRLS is based onK, whose dimensions depend on the number of input patterns. As a consequence, the inclusion of new data into the solution Eq.(20) causesKto grow without boundary. In order to curb this growth, approximate linear dependence (ALD) criterion and other methods are proposed. These methods assemble a limited dictionary and have achieved good results.

The FB-KRLS uses the “fixed memory budget” technology, which employs an active learning strategy to build a "dictionary" that first adds a new point to the memory and then discards the present least relevant data point to maintain the size of the existing data in memory unchanged.

Online FB-KRLS is to obtain a complexity not higher thanO(M2) for time-varying data, whereMis the number of patterns stored in memory. (K+λI) denotes the regularized kernel matrix obtained byi-th iteration when a new pattern {xi,yi} is first added to the memory.

The steps for realizing short-term traffic flow forecasting algorithm based on FB-KRLS are as follows:

Step 1: Give sequentially input-output patterns {xi∈U,yi}, 1,2,…

Step 3: Leti=i+1 and then get the new data (xi,yi), the output of all the existing data stored in memory can be updated as

yt=yt-ηk(xt,xi)(yt-yi),

t=1,…,M,

(21)

whereη∈[0,1] is step size .

Step 4: Adding new data (xi,yi) to the memory, and correspondingly expanding rows and columns of the current kernel matrixKi-1, there is

(22)

Its transposed matrix can be expressed as

(23)

Step 5: While memory sizei>M, for all the input-output patterns in memory, the pruning criterion should be taken into consideration[20], and the error can be obtained as

(24)

To accomplish this step, the permutation matrixPLas well asHLis defined as

(25)

(26)

(27)

Step 6: Calculateαaccording to Eq.(20) based on the updated memory data.

Step 7: Execute circularly steps 3-6 until all the training data are completed.

Step 8: Online adaptive testing. Predictxjusing adaptive prediction model based on the trained model: predict the first data with the trained FB-KRLS model, update the forecasting model with this value and predict the next data with the updated model until all the data are predicted, there is

(28)

2 Traffic flow forecasting experiment

The kernel learning methods are applied to short-term traffic flow forecasting. The prediction model is established as

(29)

wherehis forecasting step size;xtis multi-dimension input vector composed of historical traffic flowxt=(yt-1,…,yt-m), andmmeans the embedding dimension of forecasting model;f(·) is the mapping relationship to be established, i.e.f∶Rm→R. Selecting mean absolute percentage error (MAPE) and root mean square error (RMSE) and normalized RMSE (NRMSE) as the performance indicators of the experiment, there are

(30)

(31)

2.1 Experiment A

The experiment selected nine lines and recorded data imformation of road links every 15 min along many by the UTC/SCOOT system of the Traffic Management Bureau of Beijing. The road links were denoted as follows: ① Bb; ② Ch; ③ Dd; ④ Fe; ⑤ Gd; ⑥ Hi; ⑦ Ia; ⑧ Jf; ⑨ Ka. The raw data were from March 1 to March 31, 2002, 31 days in total. Considering the malfunction of detector or transmitter, the days with empty data were excluded. The remaining data for use contained 25 days and 2 400 sample points in total. Taking the traffic data collected by the road link Ch, and selecting 2 112 samples as training data, the rest were used for test data for consistency with Ref.[21].

The parameter settings are as follows. KRLS,KLMS, KELM,KPLS,KPCA,LSSVM and SVM all select Gaussian kernel functions with kernel function width ofσ=0.5; the regularization parameter of KELM isγ=200; the potential variables of KPLS isp=20; the learning rate of KLMS isη=0.3; the parameters of SVM and LSSVM areC=0.5,ε=0.3. In KPCA-SVM, KPCA-LSSVM and KPCA-KELM methods, the nonlinear principal components of KPCA method are 15; SVM selects linear kernel function; ALD-KRLS algorithm maximum dictionary capacityMmax=500 and thresholdυ=0.000 01. The learning rate of KLMS isη=0.3. FB-KRLS has the fixed memory sizeM=300, learning rateη=0.1, and regularization parameterλ=0.1. Tables 1 and 2 give the forecasting results with 15 min and 30 min in advance using different evaluation methods.

Table1Performancecomparisonofforecastingresultswith15minahead

Forecasting methodδMAPEδRMSEδNRMSESVM11.754 668.648 10.252 7LSSVM11.659 067.559 10.249 2KELM11.765 567.802 00.250 1KPCA-SVM11.622 067.498 40.248 9KPCA-LSSVM10.891 667.507 80.249 7KPCA-KELM10.870 867.461 80.248 1KPLS11.716 768.098 30.250 8ALD-KRLS10.697 665.919 60.243 1KLMS10.684 665.693 50.242 7FB-KRLS10.322 665.457 20.241 4

Table2Performancecomparisonofforecastingresultswith30minahead

Forecasting methodδMAPEδRMSEδNRMSESVM16.520 973.584 40.276 0LSSVM12.346 569.549 30.253 5KELM13.984 571.594 70.269 4KPCA-SVM13.265 170.977 80.266 9KPCA-LSSVM12.464 668.612 50.256 5KPCA-KELM12.506 170.157 90.255 9KPLS12.647 869.506 40.251 7ALD-KRLS12.448 768.135 20.251 1KLMS12.391 467.212 30.250 9FB-KRLS12.072 565.890 30.248 6

It is observed from Table 1 that compared with single kernel learning methods, the forecasting accuracy of kernel learning methods based on feature extraction is improved, such as SVM, LSSVM, KELM, KPLS, ALD-KRLS, KLMS and FB-KRLS. Besides, in the adaptive online learning algorithms, it can be clearly seen from Tables 1 and 2 that the RMSE value of each group of data is less than the former in turn, namely, FB-KRLS has the highest prediction accuracy compared with other kernel learning methods. The prediction accuracy of KLMS and FB-KRLS methods in this experiment are also slightly better than that of the variational inference Gaussian process method[21].

Fig.1 gives the forecasting curves of different forecasting methods with 15 min in advance. Fig.2 gives the forecasting curves of different forecasting methods with 30 min in advance.

Fig.1 Forecasting results with 15 min ahead using kernel adaptive filtering methods

It can be seen from Figs.1 and 2, the flow forecasting curves given by KLMS, ALD-KRLS and FBKRLS methods are closer to the actual value compared with other forecasting methods, which means it can better forecast the actual traffic flow.

Fig.2 Forecasting results with 30 min ahead using kernel adaptive filtering methods

2.2 Experiment B

The experiment selected the traffic flow data from a highway observation station in Beijing. The total observation period was 96 h in 4 days, there were 384 sets in total and time interval is 15 min[22]. The first 285 groups of data were taken as the training data, the rest data were test data, and the embedding dimensionm=4 was selected. The cross-validation method was used to determine the predictive model parameters of different kernel learning methods.

KRLS, KLMS, KELM, KPLS, KRLS, KPCA, LSSVM, SVM methods select Gaussian kernel functions with kernel function widthσ=1; the regularization parameter of KELMγ=50; the potential variables of KPLS isP=10; the parameters of SVM and LSSVM areC=0.5,ε=0.01. In KPCA-SVM, KPCA-LSSVM and KPCA-KELM methods, the nonlinear principal components of KPCA method are 10; SVM selects linear kernel function; ALD-KRLS algorithm has the maximum dictionary capacityMmax=90 and the thresholdυ=0.01.

The learning rate of KLMS isη=0.2, the fixed memory size of FB-KRLS isM=90 and the regularization parameter isλ=0.000 1.

It can be seen from Tables 3 and 4 that KLMS and KRLS achieve better prediction results in traffic flow forecasting. The prediction accuracies of KLMS and KRLS in this experiment are also compared with that in Ref.[22] which uses T-S fuzzy neural network in traffic flow forecasting.

Table3Performancecomparisonofforecastingresultswith15minahead

Forecasting methodδMAPEδRMSEδNRMSESVM33.808 623.454 80.269 3LSSVM33.813 123.015 40.262 9KELM33.862 923.260 70.267 1KPCA-SVM33.610 223.180 80.266 2KPCA-LSSVM32.901 122.847 80.262 4KPCA-KELM32.889 922.587 20.256 8KPLS32.893 722.544 90.258 9ALD-KRLS32.697 522.481 90.258 5KLMS32.580 122.470 80.258 0FB-KRLS32.429 122.456 10.255 9

Table4Performancecomparisonofforecastingresultswith30minahead

Forecasting methodδMAPEδRMSEδNRMSESVM43.446 630.429 80.347 6LSSVM42.288 930.221 20.344 9KELM43.288 829.867 90.341 2KPCA-SVM43.981 930.021 50.343 0KPCA-LSSVM43.690 329.216 90.335 5KPCA-KELM43.267 928.969 70.335 0KPLS43.131 028.860 20.332 5ALD-KRLS43.089 129.398 60.337 6KLMS42.926 529.202 90.335 3FB-KRLS41.996 128.809 50.332 0

Figs.3 and 4 show the comparison curves by different predicted methods with 15 min and 30 min in advance, respectively.

Fig.3 Forecasting results with 15 min ahead using different kernel-based learning methods

It can be seen that the proposed methods can accurately predict the actual traffic flow. The proposed adaptive filtering methods use adaptive update prediction model to complete the test phase of the experiment, and the initial model parameters are generated by the training process. If the test data need to be tested in the first test phase, the training model parameters are first used for test. When the true data are obtained, the original model parameters are updated by the true data, and then the updated model is used to test the next data until all the test data are completed.

Fig.4 Forecasting results with 30 min ahead using different kernel-based learning methods

Kernel learning algorithms can improve the nonlinear approximation ability of regression modeling. The experimental results verify the effectiveness of the proposed KAF algorithms. The KLMS algorithm and FB-KRLS algorithm can also improve the prediction accuracy to a certain extent and speed up the prediction speed. The prediction accuracy of FB-KRLS method fluctuates little and has strong adaptability, which shows a good prediction effect.

3 Conclusion

This study applies KLMS and FB-KRLS methods based on KAF to short-term traffic forecasting. Compared with other models, kernel learning algorithms can improve the nonlinear approximation capability of regression modeling. The real freeway traffic volume data sets are used for the experiments. The KLMS and FB-KRLS models are trained and their one-step and two-step forecasting performances are tested. Three criteria, MAPE, NRMES and RMSE, are employed for performance evaluation and the evaluation results are presented by data sets.

The one-step and two-step forecasting results show that the KLMS method has adaptive learning characteristics compared with the traditional LMS, and sparse rules can control the size of the kernel matrix to improve the generalization ability and prediction accuracy of the network, which fully reflects the advantages of adaptive filtering. FB-KRLS is not only an efficient update method for pruning an arbitrary point from the dictionary, but also a label update procedure to provide tracking capability by using a discarding criterion to maintain its memory size. The results of experiment A and experiment B show that KAF methods are suitable for large data sets the prediction accuracy can be improved greatly, i.e. the accuracy of FB-KRLS is improved by 3.190 9(δRMSE) compared with that of SVM.

KAF methods are a feasible and effective choice for time-varying and nonlinear traffic flow systems due to its adaptability to online learning. In view of the complexity and variety of problems in practical applications, more applications of KAF in more fields are the further research directions in the future.