Evaluation of the Historical Sampling Error for Global Models

2015-11-24SHENSiLIUJuanJuanandWANGBin

Atmospheric and Oceanic Science Letters 2015年5期

SHEN Si, LIU Juan-Juan, and WANG Bin

1State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China

2College of Earth Science, University of Chinese Academy of Sciences, Beijing 100049, China

3Ministry of Education Key Laboratory for Earth System Modeling, Center of Earth System Science (CESS), Tsinghua University, Beijing 100084, China

Evaluation of the Historical Sampling Error for Global Models

SHEN Si1,2, LIU Juan-Juan1*, and WANG Bin1,3

1State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China

2College of Earth Science, University of Chinese Academy of Sciences, Beijing 100049, China

3Ministry of Education Key Laboratory for Earth System Modeling, Center of Earth System Science (CESS), Tsinghua University, Beijing 100084, China

Various ensemble-based schemes are employed in data assimilation because they can use the ensemble to estimate the flow-dependent background error covariance. The most common way to generate the real-time ensemble is to use an ensemble forecast; however, this is very time-consuming. The historical sampling approach is an alternative way to generate the ensemble, by picking some snapshots from historical forecast series. With this approach, many ensemble-based assimilation schemes can be used in a deterministic forecast environment. Furthermore, considering the time that it saves, the method has the potential for operational application. However, the historical sampling approach carries with it a special kind of sampling error because, in a historical forecast, the way to integrate the ensemble members is different from the way to integrate the initial conditions at the analysis time (i.e., forcing and lateral boundary condition differences, and ‘warm start’ or ‘cold start’ differences). This study analyzes the results of an experiment with the Global Regional Assimilation Prediction System-Global Forecast System (GRAPES-GFS), to evaluate how the different integration configurations influence the historical sampling error for global models. The results show that the sampling error is dominated by diurnal cycle patterns as a result of the radiance forcing difference. Although the RMSEs of the sampling error are small, in view of the correlation coefficients of the perturbed ensemble, the sampling error for some variables on some levels (e.g., low-level temperature and humidity, stratospheric temperature and geopotential height and humidity), is non-negligible. The results suggest some caution must be applied, and advice taken, when using the historical sampling approach.

ensemble-based data assimilation, historical sampling approach, sampling error, GRAPES-GFS

1　Introduction

A large number of ensemble-based data assimilation schemes have been developed to date, including both ensemble filter schemes (Evensen, 1994; Hunt et al., 2007)and ensemble variational schemes (Liu et al., 2008; Tian et al., 2008, 2011; Wang et al., 2010). The ensemble in these schemes plays an important role in estimating the flow-dependent background error covariance (BEC) and thus ensures the analysis increment is more consistent with the real-time atmospheric dynamics. In most cases, the ensemble is generated by a separate or coupled ensemble forecast system. However, the ensemble forecast is computationally very expensive and not all research groups or operational centers can afford it. Even for those with an ensemble forecast system, the ensemble number is always limited to less than 100 due to the high computational cost.

The historical sampling approach is a new approach, proposed by Wang et al. (2010), along with a new ensemble-variational scheme named Dimension-Reduced Projection 4DVar (DRP-4DVar). This sampling approach generates the ensemble by picking some snapshots from recent historical forecast series and thus maintains the flow dependency while avoiding the need for an ensemble forecast. In numerical weather prediction centers, this approach demands almost no computational cost, as historical forecast series are already available in operational systems. Another advantage of this approach is that the ensemble number can be much larger (limited only by the length of historical forecast series). With this historical sampling approach, many ensemble-based assimilation schemes can be used in a deterministic prediction environment. Currently, there are several assimilation applications using this historical sampling approach (Wang et al., 2010, 2011; Tian et al., 2011; Zhao et al., 2012).

However, as an empirical method, there is no theoretical guarantee as to how the ensemble generated by the historical sampling approach represents the atmospheric background error distribution. As noted by Zhao and Wang (2010), its ensemble spread is often too small, though this can be alleviated by inflating the BEC or increasing the sampling time interval between adjacent historical samples. Besides, as noted later in the paper, when used for 4D ensemble-based schemes, the historical sampling approach carries with it a special kind of sampling error, caused by the inaccurate integration configurations of the historical ensemble members. The problem of this sampling error has not yet been addressed, and so both its magnitude and distribution pattern are still unknown. In this paper, we provide a detailed description of this spe-cial sampling error, and then evaluate it with the Global Regional Assimilation Prediction System-Global Forecast System (GRAPES-GFS).

The organization of the remainder of the paper is as follows. The historical sampling approach is described in section 2. The experimental design and results are given in section 3. Finally, a summary and discussion are provided in section 4.

2　The historical sampling approach

2.1 A brief introduction to DRP-4DVar

To exemplify how the historical sampling approach works for ensemble-based data assimilation schemes, a brief introduction to DRP-4DVar is given first.

DRP-4DVar is one of the representative 4D ensemble-variational assimilation schemes (Wang et al., 2010). In DRP-4DVar, we first need to prepare the perturbed model space ensemble Pxand its projection onto the observation space, as follows:

To exemplify how the historical sampling approach works for ensemble-based data assimilation schemes, a brief introduction to DRP-4DVar is given first.

where xbis the background and s is the ensemble size. Each member (column) of Pxrepresents the 3D model state increment (relative to the background xb) in the beginning of the assimilation window, while each member (column) of PYrepresents the simulated observation increment (relative to the background simulated observation Yb=Y( xb)) within the whole window. Y(x), ‘the observation simulator’, consists of the forecast model and the observation operator, i.e.,, while the operatorY＇(x＇)is for simulating the observation increment, defined as:

The letter I represents the unit matrix. The operator Mt0-＞tiis the nonlinear model, integrating from t0(the beginning of the window) to ti.Htiis the observation operator at ti,andare their tangent linear operators, respectively.

As shown in Eq. (3), an approximate linear relationship between Pxand PYexists; thus, for the model space increment expressed as

its simulated observation increment is approximated linearly in the vicinity of xbas

Thus,x＇andY＇(x＇)are replaced by linear operators (Pxα,PYα, respectively). In this way, the DRP-4DVar cost function is an explicit quadratic function ofα, as follows:

In Eq. (7),Bαis the BEC matrix with respect toα, O is the observation error covariance, and d is the innovation (observation minus background simulated observation). From the brief review above, the ensemble not only estimates the BEC, but also approximatesY＇(x＇). So, it is inferred that the quality of the ensemble (Px, PY) is crucial to the performance of DRP-4DVar. A simple flowchart of DRP-4DVar is given in Fig. 1a.

Note that in this work, for simplicity, we assume that the analysis time is in the beginning of the window; however, it can be tuned within the whole window for DRP-4DVar, as well as many other 4D ensemble-based methods (Liu et al., 2009; Zhao et al., 2011; Wang et al., 2011). Readers are referred to Wang et al. (2010) and Shen et al. (2014) for the detailed derivation of DRP-4DVar.

2.2 The historical sampling approach

As seen in the last section, DRP-4DVar, as well as many other ensemble-based schemes, needs to prepare the ensemble before minimization. Generally, an ensemble forecast is used to generate the ensemble at the analysis time by integrating the analysis ensemble of the last assimilation, i.e., through the ensemble cycling assimilation. Besides, for those 4D ensemble-based schemes that assimilate observations within the whole window, an extra ensemble forecast integration is needed to generate theobservation space ensemble, i.e.,PY=Y＇( Px). Thus, two ensemble forecast integrations are needed in each window. As a result, huge computational cost is expected.

Figure 1 Flowcharts of (a) the Dimension-Reduced Projection 4DVar (DRP-4DVar), (b) the historical sampling process, and (c) the ensemble forecast sampling process.

The historical sampling approach is an alternative way to generate the ensemble much more efficiently. Wang et al. (2010) gave an example of how this sampling approach can be operated. In this example, two 78-h continuous forecasts with one output per hour are prepared, which are initialized with certain analysis data at 24 h and 48 h earlier than the beginning of the window, respectively. Figure 1b demonstrates the example of the sampling process for one historical series. Considering the assimilation window of 6 h, each 78-h forecast series contains 73 windows with each window shifted 1 h from another. So, 73 members are picked for each forecast series and a total of 146 members are obtained. As shown in Fig. 1b, the model space ensembleis obtained directly from the 1-h output of the model, while the observation space ensemble(the superscript h represents the ensemble generated by the historical sampling approach) is obtained by directly projecting the correspondingxj(t0)with the observation operators, i.e., for member index i (1≤i≤73):

Thus, once the historical forecast series is prepared, no model integration is needed to generate the ensemble. Given the analysis-time backgroundbx and its projection on observation space Yb, the perturbed ensemble members (i.e., members of Px/PY) are calculated by subtracting the background from each member:

Although the historical sampling approach has many advantages, there is one potential problem associated with this method. As shown in Figs. 1b and 1c, unlike the ensemble forecast sampling approach, the integration intervals for historical ensemble members are shifted away from the assimilation window (00-06). As a result, the forcing (including sea surface temperature) and the lateral boundary condition (only for regional models) are different to that in the integration within the assimilation window. In fact, from Eq. (8), the underlying assumption for the historical sampling approach is that the prediction problem is an initial-value problem, where the forecast results are totally determined by the initial condition (IC). Besides, as the historical forecast series is a continuous run, the historical ensemble can be seen as generated by warm-start integrations, while in reality the ICs should be integrated in a cold-start way. As a consequence, theseinaccurate integration configurations of the historical ensemble will probably introduce a special kind of sampling error for. For global models, which we aremainly concerned about here, the forcing difference and the‘cold-start or warm-start’ difference are the only two sources of the sampling error.

3　Experiment and results

3.1 Experimental design

As shown in Fig. 1c, we can conduct an ensemble forecast in the correct configuration and initialized with the ensemblegenerated by the historical sampling approach. If the historical sampling approach has no such sampling error, its observation space ensembleshould be the same as that integrated from the ensemble forecast—the superscript f represents the ensemble generated by the ensemble forecast). Otherwise, the difference betweenandis exactly the sampling error. Thus, the integrated ensembleis a reference. In our evaluation experiment, GRAPES-GFS is adopted.

GRAPES-GFS (Zhang and Shen, 2008) is a global weather forecast system based on the GRAPES model, which is developed by the China Meteorological Administration. In this study, the GRAPES-GFS with a 1° resolution, 36 vertical levels, and 600-s time step configuration is utilized. Besides, the code is modified so that during the integration it outputs the input-file-formatted files every hour, which can be used directly as the initial input files for GRAPES-GFS.

Initialized with the 1° × 1° resolution National Centers for Environment Prediction final global tropospheric (NCEP/FNL) analysis, a 78-h integration starting from 0000 UTC 1 June 2011 provides a historical forecast series for the analysis at 0000 UTC 3 June 2011. Seventy-three ensemble membersare picked, and then each is used as the IC to start a 6-h integration from 0000 UTC 3 June 2011, to generate the reference ensemble. The background xbat 0000 UTC 3 June 2011 is obtained from the previous 6-h forecast, i.e., integrated from 1800 UTC 2 June 2011 and initialized with the NCEP/FNL analysis. Its projection onto observation space (Yb) is obtained by integrating xbwithin the assimilation window.

Here, it is assumed that the observations are measured directly on every grid point, so, in fact, no observation operators are performed. Furthermore, for additional simplicity, we assume the observations are at the end of the window. So, we just need to compare the terminal output of the model to begin with, but we then show the impact of the observations at different times at the end of section 3.2.

3.2 Results

Figure 2 Historical sampling error measured by global averaged RMSEs (blue lines) and correlation coefficients (red lines). The red solid lines represent the correlation coefficients of the original ensemble members (CORR), while the red dashed lines represent the correlation coefficients of the perturbed ensemble members (PCOR). Each figure represents one variable: (a) skin temperature; (b) surface pressure; (c) 925 hPa zonal wind; (d) 850 hPa temperature; (e) 700 hPa specific humidity; and (f) 500 hPa geopotential height. The horizontal axis represents sampling time (see Fig. 1b).

Besides the radiance forcing, the impact of the‘cold-start or warm-start’ integration difference can also be observed. From Figs. 2b-e, the -48-h shifted historical ensemble member has even smaller sampling errors than that of the 00-h shifted member. We speculate that the reason is that the -48-h shifted historical ensemble member is integrated in a ‘cold-start’ configuration, the same as that of the reference, while the 00-h shifted historical ensemble member is integrated in the different ‘warmstart’ configuration.

Although the evidence of the two sources of sampling error is clear, their magnitudes are very small, which suggests the 6-h integration can be approximated as an initial-value problem. However, it cannot be simply concluded that the historical sampling error is negligible. Actually, as shown in Eq. (7), it is the perturbed ensemble, rather than the original ensemble, that matters in the assimilation. So, we recalculate the correlation coefficients between the perturbed ensembleand the perturbed reference(PCOR hereafter) for further evaluation. As the large common component Ybis subtracted fromand, drastic decreases for PCOR over CORR, as shown by the red dashed lines in Fig. 2, are expected. Besides the decrease in values, the diurnal cycle patterns of PCOR are also disturbed. For all of the six variables, both the second and the third diurnal cycles ([-24, 0] and [0, 24] cycles) have larger amplitudes than the first cycle ([-48, -24] cycle). Let us useM( θ; x)to denote the 6-h forecast operator, withθdenoting the forcing and x denoting the IC. Thus, if we perturbθand x, we have the perturbed forecast as:

From Eq. (10), the perturbed forecastY＇consists of two items, one caused by the perturbed forcingand the other caused by the perturbed IC. Furthermore, the smaller the magnitude of the perturbed IC (x＇), the larger the proportion that the perturbed forcing (θ＇) contributes to the perturbed forecast (Y＇). In extreme cases, wherex＇is small enough,Y＇is totally determined byθ＇, as follows:

For the historical ensemble members sampled during the second and third cycles, as their times are closer to the analysis time compared to the first cycle, they are more likely to have smaller perturbed ICs (xi＇), which we have verified already but the figures are omitted. As a result, the perturbed forcing contributes a larger proportion for the perturbed ensemblewithin the second and third cycles, and thus amplified PCORs of the second and third cycles are expected.

From the vertical distributions of the sampling error (PCOR shown in Fig. 3), besides the low level temperature, the stratospheric temperature is also perturbed in a diurnal cycle pattern, which can be attributed to the high concentration of ozone in the stratosphere. Influenced through the hydrostatic balance relationship, the high level geopotential height also shows a significant diurnal cycle pattern. For the specific humidity, many smallscaled large-negative-valued correlation noises exist above 100 hPa, especially around the 20 hPa level. However, we are unsure whether this small-scaled structure is model-dependent or physically reasonable. For the low level humidity (lower than 900 hPa), its sampling error also has a typical diurnal cycle pattern, with its minimum PCOR at around 0.8 (unrecognized in Fig. 3c). Unlike these three thermal variables, the zonal wind suffers only slightly from the sampling error.

Until now, all the comparisons have been performed under the assumption that the observations are at the terminal end of the window. In Fig. 4, we give a simple demonstration of how the observation time influences the sampling errors. As expected, as the integration time increases, the forcing’s contribution to the forecast results enlarges and the magnitudes of the sampling errors are amplified. These results suggest that the sampling error can be alleviated if a shorter assimilation window is selected or the analysis time is set in the middle of the window.

Figure 3 Vertical distributions of historical sampling errors (PCOR) for (a) temperature, (b) geopotential height, (c) specific humidity, and (d) zonal wind.

Figure 4 Distributions of historical sampling errors (PCOR) along with the integrated time for (a) skin temperature, (b) 850 hPa temperature, (c) 20 hPa geopotential height, and (d) 925 hPa specific humidity.

4　Summary and discussion

The historical sampling approach provides a very efficient way to generate ensemble members for data assimilation without conducting an ensemble forecast. In this paper, we point out one potential problem associated with this approach when used for 4D ensemble-based schemes, which may introduce a special kind of sampling error for the observation space ensemble. We take GRAPES-GFS as an example to evaluate how this sampling error is distributed for global models. As revealed in the experiment results, the sampling error measured by global-averaged RMSEs and CORRs is of small magnitude in general, but with dominant diurnal cycle patterns. This is attributed to the radiance forcing difference during the historical ensemble integration. We further evaluate the PCOR, which is believed to be a more important metric. From the PCOR results, the sampling errors for some variables on some levels (e.g., low-level temperature and humidity, stratospheric temperature and geopotential height and humidity) are non-negligible and need to be considered. From the results, we can propose some simple advice for the historical sampling approach, when used with 4D ensemble-based schemes. For example, to skip sampling at some specific sampling time, to exclude some observations sensitive to some specific variables on some specific levels, to use a shorter window, or to set the analysis time in the middle of the window. Besides, we may also develop some bias correction schemes based on the diurnal cycle characteristics. As this work is only a simple evaluation, other questions as to how this sampling error affects the assimilation and how to overcome this sampling error still need further investigation. In addition, the sampling error for regional models is another important question, where the lateral boundary conditions may play a more important role, and thus the sampling error is probably even larger. Finally, more research on other aspects of the historical sampling approach, e.g., the representativeness of the historical ensemble, and elaborative tuning of the sampling process, is also necessary before this approach can be applied operationally.

Acknowledgements. This work was jointly supported by the China Meteorological Administration for the R&D Special Fund for Public Welfare Industry (Meteorology) (Grant No. GYHY(QX)201406015) and the Southern China Monsoon Rainfall Experiment (SCMREX). We are also very grateful to the Numerical Weather Prediction Center of the China Meteorological Administration for providing the GRAPES-GFS code and instructions.

References

Evensen, G., 1994: Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res., 99, 10143-10162.

Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter, Phys. D: Nonlinear Phenom., 230, 112-126.

Liu, C., Q. Xiao, and B. Wang, 2008: An ensemble-based fourdimensional variational data assimilation scheme. Part I: Technical formulation and preliminary test, Mon. Wea. Rev., 136, 3363-3373.

Liu, C., Q. Xiao, and B. Wang, 2009: An ensemble-based fourdimensional variational data assimilation scheme. Part II: Observing system simulation experiments with Advanced Research WRF (ARW), Mon. Wea. Rev., 137, 1687-1704.

Shen, S., J. Liu, and B. Wang, 2014: An extension of the dimension-reduced projection 4DVar, Atmos. Oceanic Sci. Lett., 7, 324-329.

Tian, X., Z. Xie, and A. Dai, 2008: An ensemble-based explicit fourdimensional variational assimilation method, J. Geophys. Res., 113, D21124, doi:10.1029/2008JD010358.

Tian, X., Z. Xie, and Q. Sun, 2011: A POD-based ensemble fourdimensional variational assimilation method, Tellus A, 63(4), 805-816.

Wang, B., W. Cheng, Y. Xu, et al., 2011: A four-dimensional variational data assimilation approach with analysis at the end of assimilation window. Part I: Methodology and preliminary tests, J. Meteor. Soc. Japan, 89(6), 611-623.

Wang, B., J. Liu, S. Wang., et al., 2010: An economical approach to four-dimensional variational data assimilation, Adv. Atmos. Sci., 27, 715-727.

Zhang, R., and X. Shen, 2008: On the development of the GRAPES—A new generation of the national operational NWP system in China, Chin. Sci. Bull., 53, 3429-3432.

Zhao, J., and B. Wang, 2010: Sensitivity of the DRP-4DVar performance to perturbation samples obtained by two different methods, Acta Meteor. Sinica, 24(5), 527-538.

Zhao, J., B. Wang, and J. Liu, 2011: Impact of analysis-time tuning on the performance of the DRP-4DVar approach, Adv. Atmos. Sci., 28(1), 207-216.

Zhao, Y., B. Wang, and J. Liu, 2012: A DRP-DVar data assimilation scheme for typhoon initialization using sea level pressure data, Mon. Wea. Rev., 140(4), 1191-1203.

Shen, S., J.-J. Liu, and B. Wang, 2015: Evaluation of the historical sampling error for global models, Atmos. Oceanic Sci. Lett., 8, 250-256,

10.3878/ AOSL20150001.

4 January 2015; revised 7 April 2015; accepted 7 April 2015; published 16 September 2015

LIU Juan-Juan, ljjxgg@mail.iap.ac.cn

Atmospheric and Oceanic Science Letters

2015年5期