用Velocity Verlet积分器 改进HMC抽样方法
2024-04-29李婉荧唐亚勇
李婉荧 唐亚勇
Hamilton Monte Carlo (HMC)方法是一种常用的快速抽样方法. 在对哈密顿方程进行抽样时,HMC方法使用Leapfrog积分器,这可能造成方程的位置及动量的迭代值在时间上不同步,其产生的误差会降低抽样效率及抽样结果的稳定性. 为此,本文提出了IHMC(Improved HMC)方法,该方法用Velocity Verlet积分器替代Leapfrog积分器,每次迭代时都计算两变量在同一时刻的值. 为验证方法的效果,本文进行了两个实验,一个是将该方法应用于非对称随机波动率模型(RASV模型)的参数估计,另一个是将方法应用于方差伽马分布的抽样,结果显示:IHMC方法比HMC方法的效率更高、结果更稳定.
HMC方法; Velocity Verlet积分器; RASV模型; 方差伽马分布
O29 A 2024.011004
Improving the HMC sampling method by Velocity Verlet integrator
LI Wan-Ying, TANG Ya-Yong
(School of Mathematics, Sichuan University, Chengdu 610064, China)
Hamilton Monte Carlo (HMC) method is a fast sampling method. To sampling a Hamiltonian equation, the HMC method uses the Leapfrog integrator. As a result, unsynchronized values of position and momentum variables of the equation are possibly generated during the iterative process and resulting estimation error can seriously damage the sampling efficiency and stability of sampling result. To solve this problem, here we propose an improved HMC (IHMC) method by replacing the Leapfrog integrator in HMC method with Velocity Verlet integrator. The IHMC method calculates the variables simultaneously in each iteration. Two numerical examples are implemented to check the performance of the IHMC method, one is the problem of estimating the parameters of realized and stochastic volatility (RASV) model with asymmetric effects, the other is the problem of sampling the variance Gamma distribution. It is shown that the IHMC method has higher sampling efficiency and more stable sampling result compared with the HMC method.
Hamiltonian Monte Carlo method; Velocity Verlet integrator; RASV model; Variance Gamma distribution
1 Introduction
The Hamilton Monte Carlo (HMC) method is a fast sampling method combining the Markov chain Monte Carlo (MCMC) method and Hamiltonian dynamics. To sampling a Hamiltonian equation, the HMC method uses Leapfrog integrator. This method is firstly introduced in the late 1980s by Duane et al . [1] , known as the hybrid Monte Carlo method. The HMC method is mainly applied in physics until Neal [2] realized that it can also be applied to Bayesian neural network. In 2011, Neal [3] further provided a detailed introduction to this method and made it widely noticed and applied. Nowadays the HMC method has more and more applications in simulations of various chemical systems, physical systems and statistical inferences.
李婉荧, 等: 用Velocity Verlet积分器改进HMC抽样方法
Compared with the MCMC method, the HMC method has higher efficiency and stability. But there is still some room for improvement. Recently, combined with some important sampling methods, new modified Hamiltonian Monte Carlo (MHMC) methods are proposed [4-6] . Notably, Radivojevi et al . [4] used the splitting integrator instead of the Leapfrog integrator to get better conservation properties. Remarkably, unsynchronized values of the position and momentum variables of the Hamiltonian equation are possibly generated during the iterative process and the resulting estimation error can seriously damage the sampling efficiency and stability of sampling result. In this paper, we try to use the Velocity Verlet integrator to improve the HMC method, since the Velocity Verlet integrator calculates the position and momentum variables simultaneously with accuracy similar to the Leapfrog integrator.
The rest of the paper is organized as follows. Section 2 and Section 3 describe the HMC method and the IHMC method. In Section 4, two numerical examples are used to show the performance of the IHMC method. Finally, we summarize this paper in Section 5.
2 The HMC method
The HMC method combines the MCMC method with Hamiltonian dynamics. For computer simulation, the Hamiltonian equation is firstly discretized by using a small time-step ε . Then the Leapfrog integrator is used to carry out the iterative sampling.
Hamiltonian dynamics operates on a d -dimensional position vector q and a d -dimensional momentum vector p , thus the full state space is 2 d -dimensional. The system can be described by a function of q and p known as the Hamiltonian, H(q,p) , written as
H(q,p)=U(q)+K(p),
here U(q) is called potential energy, K(p) is called kinetic energy, and the functions are defined as,
U(q)=- ln f(q),K(p)= 1 2 |p | 2 ,
where f(q) is the probability density of q . The position variable q is obtained by solving the Hamiltons equation given by the following equations
d q t d t = H p t = p t ,
d p t d t =- H q t =-
U( q t ) (1)
respectively,where t represents time, q t and p t can also be written as q(t) and p(t) . The Leapfrog integrator has local error of order ε 3 and global error of order ε 2 . The Leapfrog integrator guarantees the symplectic and volume preservation properties by performing the following steps in each iteration:
p t+ ε 2 =p(t)- ε 2
U q(t) ,
q(t+ε)=q(t)+εp t+ ε 2 ,
p(t+ε)=p t+ ε 2 - ε 2
U q(t+ε) (2)
Step 1 Input the step size SymboleA@
, the number of iterations L and the initial value q 0 for q .
Step 2 Draw p from N 0,I .
Step 3 Simulate discretization of the Hamiltonian dynamics by (2). Then we do
(i) half-step update p : p←p- ε 2
U(q) ;
(ii) alternating update p and q for L times, say, q←q+εp, p←p-ε
U(q);
(iii) half-step update p : p←p- ε 2
U(q) .
Step 4 Let (q,p) be the value taken before the update, q ︿ , p ︿ be the value taken after the update. Calculate the acceptance probability
P= exp {-H q ︿ , p ︿ +H(q,p)} .
Step 5 Draw u ~Uniform[0,1], then
q= q, if u≥P,
q ︿ , if u
When the HMC method runs, the momentum vector is first updated once with half a step, then the position and momentum vectors are updated a specified number of times with one step, and finally the momentum vector is updated once again with half a step. Specifically, the obtained position and momentum vectors are p ε 2 ,q(ε),p 3ε 2 ,q(2ε),p 5ε 2 ,q(3ε),… in order. A noteworthy drawback of the HMC method is that the position and momentum vectors are not synchronized in time, which may result in errors during iteration and affect the sampling efficiency and stability of sampling result.
3 The IHMC Method
3.1 The Velocity Verlet integrator
To avoid the errors caused by the HMC method, here we introduce the Velocity Verlet integrator [5] . Similar to the Leapfrog integrator, the Velocity Verlet integrator also has order ε 3 local error and order ε 2 global error and can preserve the symplectic and volume [6] besides that the momentum and position vectors can be calculated simultaneously.
The target variable ( q t , p t ) discretized by the Velocity Verlet integrator with step size ε is the solution of the following differential system:
d q t d t = p t - ε 2
U q t ,
d p t d t =- 1 2 (
U( q t )+
U( q t )) (3)
where t = max {s∈ε Z :s≤t}, t = min {s∈ε Z : s≥t } . Compared with (1), the gradient term in equation (3) is used to reduce the error of the numerical integration. The Velocity Verlet integrator updates the position and momentum variables as follows.
q(t+2ε)=q(t)+2εp(t)-2 ε 2
U(q(t)),
p(t+ε)=p(t)- ε 2
U(q(t))+
U(q(t+2ε)) ,
q(t+ε)=q(t)+εp(t+ε)- ε 2
U(q(t)) (4)
Then, by using (4) to replace (2), Step 3 in the HMC method should be replaced by
Step 3 * Simulate discretization of Hamiltonian dynamics by (4). Then we do alternating update p and q for L times:
q v ←q+2εp-2 ε 2
U(q),
p←p- ε 2
U q +
U q v ,
q←q+εp- ε 2
U(q).
Compared with (2), (4) contains more computations on the premise of obtaining p(t+ε) and q(t+ε) with the same ε . However, (4) can reduce the estimate errors and solve the time synchronization problem with higher stability. That is to say, good result still can be expected even for larger step sizes. We will confirm this by some simulations in the next section.
3.2 Convergence of the IHMC method
To investigate the convergence of the IHMC method, suppose that U is a quadratic differentiable function satisfying that
U(x) ≤L x , which can be easily satisfied by most functions. Consider that T∈ 0,∞ ,ε∈[0,1] satisfying T/ε∈ Z (ε≠0) and assume that L( T 2 +εT)≤1 which can be satisfied by choosing a suitable T for a given L . We have the following two theorems.
Theorem 3.1 For given initial values q 0 , p 0 ∈ R d , the following inequalities hold:
max m≤T q m ≤ 1 1- L 2 T 2 +εT ·
max q 0 , q 0 + p 0 T ,
max m≤T p m ≤ p 0 + 2 T+ε ·
max q 0 , q 0 + p 0 T .
Proof From the equations (3), for all s∈[0,T] we have
∫ s 0 d d t q t = q s - q 0 =
∫ s 0 p r d r- ε 2 ∫ s 0
U q r d r,
where
p r = p 0 - 1 2 ∫ r 0
U q u +
U q u d u,
r = max t∈ε Z :t≤r , r = min {t∈ε Z :t≥r} . Combining the above two equations gives
q s = q 0 + p 0 s-
1 2 ∫ s 0 ∫ r 0
U q u +
U q u d u d r-
ε 2 ∫ s 0
U q r d r (5)
Then we have
‖ q s - q 0 - p 0 s‖=‖ 1 2 ∫ s 0 ∫ r 0 (
U( q u )+
U( q u )) d u d r- ε 2 ∫ s 0
U( q r ) d r‖≤
1 2 ∫ s 0 ∫ r 0 (
U( q u ) +
U( q u ) ) d u d r+
ε 2 ∫ s 0
U( q r ) d r≤ L 2 ∫ s 0 ∫ r 0 ( q u +
q u ) d u d r+ εL 2 ∫ s 0 q r d r≤
L∫ s 0 ∫ r 0 max m∈ 0,T q m d u d r+
εL 2 ∫ s 0 max m∈ 0,T q m d r≤
L 2 ( s 2 +εs) max m∈ 0,T q m ≤
L 2 ( T 2 +εT) max m∈[0,T] q m .
In the above equation max m∈[0,T] q m can be rewritten as
max m∈[0,T] q m ≤ max m∈[0,T] q m - q 0 - p 0 m +
max { q 0 , q 0 + p 0 T } (6)
It is further obtained that
max m∈[0,T] q m - q 0 - p 0 m ≤
L 2 T 2 +εT 1- L 2 T 2 +εT max { q 0 , q 0 + p 0 T } (7)
Substituting (7) into (6) again gives
max m∈ 0,T q m ≤ 1 1- L 2 T 2 +εT ·
max { q 0 , q 0 + p 0 T } (8)
From the equations (3), we have
p s = p 0 - 1 2 ∫ s 0
U q r +
U q r d r (9)
From (8) and (9), we get
max m∈[0,T] p m ≤ p 0 + LT 1- L 2 T 2 +εT ·
max { q 0 , q 0 + p 0 T }.
From L( T 2 +εT)≤1 , the above equation can be reduced to
max m∈[0,T] p m ≤ p 0 + 2 T+ε · max { q 0 ,
q 0 + p 0 T } (10)
The proof is completed.
Theorem 3.2 For given initial values q 0 , p 0 ∈ R d , the following inequalities hold:
max s,t≤T q t - q s ≤ p 0 T+
max { q 0 , q 0 + p 0 T },
max s,t≤T p t - p s ≤
2 T+ε max { q 0 , q 0 + p 0 T }.
Proof Without loss of generality, assume that 0≤s q t - q s ≤ p 0 t-s + 1 2 ∫ t s ∫ r s U q u + U q u d u d r+ ε 2 ∫ t s U q r d r≤ p 0 T+ L 2 ( (t-s) 2 + ε(t-s)) max m∈[0,T] q m ≤ p 0 T+ L 2 ( T 2 + εT) 1 1- L 2 T 2 +εT · max { q 0 , q 0 + p 0 T }≤ p 0 T+ max { q 0 , q 0 + p 0 T }. i.e., max s,t≤T q t - q s ≤ p 0 T+ max { q 0 , q 0 + p 0 T } (11) From (9) we have p t - p s ≤ 1 2 ∫ t s U q r + U q r d r≤ L(t-s) max m∈[0,T] q m ≤ LT 1- L 2 T 2 +εT · max { q 0 , q 0 + p 0 T }≤ 2 T+ε max { q 0 , q 0 + p 0 T }, i.e., max s,t≤T p t - p s ≤ 2 T+ε max { q 0 , q 0 + p 0 T } (12) asrequired. The proof is end. Remark 1 Theorem 3.1 tells us that in a time interval T , the values of position variable p and the velocity variable q generated by the IHMC method can be controlled by the corresponding constant velocity motion (initial position q 0 , initial velocity p 0 ). Furthermore, Theorem 3.2 tells us that in a chain generated by IHMC method, the deviations of the variable values at all moments are restricted to a certain range (given by the initial conditions of q 0 , p 0 , and the upper time limit T ). Thus the convergence of the IHMC method is proved. 4 Numerical examples Example 4.1 Parameter estimation for the RASV model. Stochastic volatility (SV) model is an essential tool for financial time series volatility. It identifies the time-varying return volatility in financial market by describing volatility as a random function. Nevertheless, the application of SV model is limited by the problem of evaluating their likelihood functions. To handle this problem, some Bayesian-based approach with the HMC method are used to accurately infer the parameters of SV model [7,8] . To compare the performance of the HMC and IHMC methods, we establish an SV model with realized volatility (RV) and asymmetric effect, abbreviated as RASV model [9] , by utilizing daily rate of return data and RV containing intraday high-frequency rate of return data information. Suppose that we have m intraday returns during day t , r t,i m i=1 , so that a simple RV estimator [10] is defined as the sum of squared returns, i.e. , RV t =∑ m i=1 r 2 t,i . The RASV model is described as R t = σ ε 2 e 1 2 α t ε t ,t=1,…,T, Y t =c+ α t + σ u u t ,t=1,…,T, α t+1 =φ α t + σ η η t ,t=1,…,T-1, α 1 ~N 0, σ η 2 1- φ 2 , ε t u t η t ~N 0 0 0 , 1 0 ρ 0 1 0 ρ 0 1 , where = R t T t=1 ,Y= Y t T t=1 ,α= α t T t=1 . Let R and Y be two observation vectors, R t is the rate of return over a unit time period, and Y t is the logarithm of RV estimator, α is the latent vector, and N ·,· represent a normal distribution. Let θ=(c, σ u , σ ε ,φ, σ η ,ρ) . The joint posterior density of the model can be written as p θ,α|R,Y =∏ T t=1 p R t | σ ε , α t × ∏ T t=1 p Y t |c, σ u ×p θ ×p α 1 |φ, σ η × ∏ T t=2 p α t | α t-1 , σ ε ,φ, σ η ,ρ . While estimating the RASV model, both the HMC method and the IHMC method are applied for parameters α,φ, σ ε , σ η ,ρ , and parameters σ u and c are sampled directly from the conditional posterior density. The data used in this example are the 1-minute high-frequency data of Shanghai Composite Index from January 4, 2021 to February 21, 2022. We take the opening and closing prices of each day to calculate the daily return rate, and get the series R= R t T t=1 . The RV estimator was calculated by using the daily 1-minute high-frequency data, and the logarithm was taken to obtain the sequence Y= Y t T t=1 . We estimate the RASV model by running the corresponding 25 000 iterations of the HMC and IHMC methods and taking 5000 iterations as burn-in period. The prior distributions required in the joint prior density are set as σ 2 ε ~IG 5,4 , σ 2 u ~IG 2.5,0.025 , σ 2 η ~IG 2.5,0.025 , φ+1 2 ~B 20,1.5 , ρ+1 2 ~B 1.5,5 ,c~N 0,10 , where B(·,·) and IG(·,·) represent the Beta and inverse Gamma distributions, respectively. The choice of these prior distributions ensures that all parameters make sense. Particularly, the Beta prior for φ and ρ ensures that -1<φ,ρ<1 . For initial parameter values, we set σ (0) ε =1, c (0) =0, σ (0) u =0.1, φ (0) =0.95, σ (0) η =01, ρ (0) =-0.3, α (0) = α (0) t T t=1 , where α 1 (0) ~N(0, σ 2 η 1- φ 2 ), α i (0) ~N(φ α i-1 , σ 2 η ) . Tuning parameters for the HMC method have been reported to be more difficult than tuning the parameters for other MCMC methods, because the performance of the HMC-based method is highly sensitive to two specified parameters, i.e. , the step size and the number of iterations. For its larger computation in the IHMC method, we use a larger step size to compensate. In the following simulation, we set iteration number 100 for both the HMC method and the IHMC method. For the parameter α , we set step size 0.005 for the HMC method, while 0.01 for the IHMC method, and for the parameters φ, σ ε , σ η ,ρ , we set step size 0.006 for the HMC method, while 0012 for the IHMC method. The estimation is implemented by R software. Tab.1 presents the result obtained by using the HMC method and the IHMC method. The IF is defined as IF =1+2∑ ∞ j=1 ρ j , where ρ j is the autocorrelation at lag j , which is used to illustrate the efficiency of the method and reflect the convergence rate. This quantity can be interpreted as the number of correlated samples with the same variance-reducing power as one independent sample. An IF value of 1 indicates that the draws are uncorrelated while large values indicate a slow convergence as well as a bad mixing. Note that Nugroho and Morimoto [11] have proven that the IFs are relatively not sensitive to the initial values. It can be seen that the resulting parameter estimates are quite close. By comparison, the SD value and standard error of the IHMC method is smaller, indicating that the fluctuation of variance of the sampling result obtained by the IHMC method is smaller, and the sampling results are more reliable. Meanwhile, the IFs of the IHMC method are all lower than those of the HMC method, indicating that the convergence rate is faster and the mixing effect is better. The acceptance rate of the IHMC method is obviously higher than that of the HMC method, and the acceptance rate is very close to 1, which shows that the IHMC method has a higher sampling efficiency. The trace plots and the autocorrelation plots for posterior samples of two methods are displayed in Figs.1 and 2, respectively. The trace plots show that the sampling result of the IHMC is more stable. The autocorrelation plots exhibit quick decay of the autocorrelation as time lag between samples increases, indicating that the process is stationary. As observed in the autocorrelation plots, the attenuation of the autocorrelation of two HMC methods are at a good pace and have the similar performance. For two HMC methods, the autocorrelation attenuation and acceptance rate of σ ε is slower than other parameters, it may be that the σ ε parameter is highly sensitive to the model, which needs further study. Example 4.2 Sampling the variance Gamma distribution. Financial time series noise often exhibits spiky and thick-tailed properties. we use the normal scale mixture distribution—Variance Gamma distribution (VGD) to verify the effectiveness of the IHMC method for sampling distributions that exhibit spiky and thick-tailed properties. The VGD was proposed by Seneta et al . [12,13] to describe stock market returns. The density of VGD can be written as ∫ ∞ 0 N(x|μ, σ 2 θ )IG(θ| 2 , 2 ) d θ. That is to say, the VGD is equivalent to the following hierarchical form: X μ, σ 2 ,θ~N x|μ, σ 2 θ and θ ~ IG θ| 2 , 2 . Set the parameter μ,σ, = 6,2,5 and initial value x 0 =2 to sample the distribution using the HMC method and the IHMC method. Similar to Example 4.1, we set iteration number 100 for two methods, step size 0.25 for the HMC method and 0.5 for the IHMC method. The sampling result are shown in Tab.2, Figs.3 and 4. From the sampling result, it can be seen that the IHMC method is more stable and has higher acceptance rate. Comparing the IF values and the autocorrelation function plots in Fig.4 reveals that the sampling result for the IHMC method has better convergence speed and mixing effect. 5 Conclusions In this study, we have improved the HMC method by combining the Velocity Verlet integrator and proposed the IHMC method. By using two numerical examples, we show that the sampling results for the IHMC method are more stable, with better SD, standard error, acceptance rate and IF values. Therefore, we may make the conclusion that the IHMC method has higher sampling efficiency and stability than the HMC method. It is worth noting that it is also interesting to improve the HMC method by combining the Velocity Verlet integrator mentioned in this paper with the existing MHMC methods, which is the major topic of our future work. References: [1] Duane S, Kennedy A D, Pendleton B J, et al . Hybrid Monte Carlo [J]. Phys Lett B, 1987, 195: 216. [2] Neal R M. Bayesian learning for neural networks [M]. New York: Springer, 1996. [3] Neal R M. MCMC using Hamiltonian dynamics [C]// Handbook of Markov chain Monte Carlo. Boca Raton: Chapman & Hall/CRC Press, 2011. [4] Radivojevi T, Fernández-Pendás M, Sanz-Serna J M, et al . Multi-stage splitting integrators for sampling with modified Hamiltonian Monte Carlo methods [J]. J Comput Phys, 2018, 373: 900. [5] Swope, W C, Andersen H C, Berens P H, et al . A computer simulation method for the calculation of equilibrium constants for the formation of physical clusters of molecules: application to small water clusters [J]. J Chem Phys, 1982, 76: 637. [6] Bou-Rabee N, Sanz-Serna J M. Geometric integrators and the Hamiltonian Monte Carlo method [J]. Acta Number, 2018, 27: 113. [7] Shephard N. Fitting nonlinear time-series models with applications to stochastic variance models [J]. J Appl Econ, 1993, 8: S135. [8] Jacquier E, Polson N G, Rossi P E. Bayesian analysis of stochastic volatility models [J]. J Bus Econ Stat, 2002, 20: 69. [9] Takahashi M, Omori Y, Watanabe T. Estimating stochastic volatility models using daily returns and realized volatility simultaneously [J]. Comput Stat Data An, 2009, 53: 2404. [10] Takahashi M, Watanabe T, Omori Y. Volatility and quantile forecasts by realized stochastic volatility models with generalized hyperbolic distribution [J]. Inter J Forecast, 2016, 32: 437. [11] Nugroho D B, Morimoto T. Estimation of realized stochastic volatility models using Hamiltonian Monte Carlo-Based methods [J]. Comput Stat, 2015, 30: 491. [12] Madan D B, Seneta E. The variance gamma (VG) model for share market returns [J]. J Bus, 1990: 511. [13] Seneta E. Fitting the variance-Gamma model to financial data [J]. J Appl Probab, 2004, 41: 177. 收稿日期: 2022-10-17 作者简介: 李婉荧(1999-), 山西临汾人, 博士研究生, 主要研究方向为金融数学. 通讯作者: 唐亚勇. E-mail: yayongtang@scu.edu.cn