High-level data assimilation workflows in DART
In this section we present two high-level data assimilation workflows that show the relevant DART programs with their inputs and outputs. These two workflows represent two different types of DA experiments typically run.
It is possible to run DART in Observation System Simulation Experiment (OSSE) mode. In OSSE mode, a perfect “true” model trajectory is created, and synthetic observations are generated from the “truth” with added noise. This is useful to test the theoretical capability of DA algorithms, observations, and/or models. In this document so far, we have conducted only OSSEs.
It is also possible to run DART in a more realistic Observation System Experiment (OSE) mode. In an OSE, there is no perfect model truth, which is similar to real-world situations where the true values of the model state will likely never be perfectly known. The observations (which again themselves are noisy and imperfect) are the only way to get a look at the “truth” that is estimated by the model state. In OSE mode, the user must provide observations to DART, which are usually from real-world observation systems (which come with all of their own idiosyncrasies and imperfections). DART can help generate ensemble perturbations, or the user can specify their own.
The filtering aspect is the same for both OSSE and OSE experiments, and many of the same tools for data assimilation are available in OSSE and OSE modes. The core difference, therefore, is the existence of the perfect model “truth.”
For a simple model such as Lorenz 63 investigated above, DART can typically advance the model time explicitly through a Fortran function call, allowing the filtering to compute all necessary time steps in sequence without exiting the DART program. However, for larger models (or those that DART cannot communicate with through Fortran), a shell-script may be necessary to run the model and advance the time forward. For the largest models, the model state is typically advanced in parallel over many computing nodes on a supercomputer. In this more complex case, DART only considers one step at a time in order to combine the observations and the prior ensemble to find the posterior analysis, which will then be used to restart the model and continue the forecast.
For efficiency reasons, data from models with large states may be written in separate files for every ensemble member at every stage of the assimilation process. Data from models with small states may be conveniently be written as variables inside a single netCDF file.
Simple model workflow with an OSSE
The first example DA workflow is for a model that can be advanced by DART with all ensemble members stored in a single file running an OSSE. Details of the executables mentioned below can be found in Programs included in DART.
As shown, the program preprocess takes the
input.nml namelist file and
generates Fortran code for the observations. This code, along with the namelist,
is used by all subsequent programs. create_obs_seq is used to define a set of
set_def.out, which can be replicated through the program
create_fixed_network_seq to create a
obs_seq.in file. There are two inputs
to perfect_model_obs: the
obs_seq.in file and
here is generated by
perfect_input.cdl via ncgen).
perfect_model_obs with the observation template (i.e. the location and type of
perfect_input.nc provides the initial state that will
be used to advance the model. On output, the “perfect” model state at the final
time, which can be used as a restart for running this procedure again, will be
perfect_output.nc could be renamed to
perfect_input.nc to extend the OSSE), while the entire state trajectory will
be stored in
true_state.nc. The noisy synthetic observations and noise-free
truth (for verification only) will be stored in
obs_seq.out. The observation
obs_seq.out will be input to filter along with the
filter_input.nc (generated by
filter_input.cdl via ncgen), which
contains the initial state for all the ensemble members. The output of filter
preassim.nc, which contains the prior state for all the ensemble members
just before applying DA (so including prior inflation if it is being used);
analysis.nc, which contains the posterior state for all the ensemble members
after assimilation (and including inflation if it is being used);
filter_output.nc, which is the final posterior that could be used to restart
the OSSE process; and
obs_seq.final, which adds the forward-calculated
expected values h(x) for each observation. The
can be analyzed and binned by the obs_diag program, producing the file
obs_diag_output.nc which can be used for diagnostics.
Complex model workflow with an OSE
The second workflow is for a complex model with all ensemble members stored in separate files running an OSE. In this case, DART will only operate on one model output at a time. External programs will advance the model states, generate the observations, and call DART again. Details of DART’s internal programs, which are mentioned below, can be found in Programs included in DART. The following diagram in shows the high-level DART flow in this case:
Within a single time step, DART will use the filter program to run the “Assimilate” portion of the above diagram and/or the “diagnostics” as follows:
The single time-step workflow for an OSE experiment within a single step is
slightly simpler than the OSSE equivalent as DART handles less of the process.
Like the OSSE case, the namelist and preprocessed observation source files are
input to all other DART programs. In the OSE case, however, the user must
provide an obs converter that will output a
obs_seq.out file. There are
many DART utilities to make this process easier, but for the OSE case the
obs_seq.out file is ultimately the user’s responsibility (to avoid
duplicating effort, see the list of existing observation types in Important
capabilities of DART). Here, the option to run with one
file for each ensemble member is demonstrated. There are k ensemble members
used as input to filter, which also outputs k members for the prior and
obs_diag_output.nc are used in the same
way as in the OSSE case. The names of the input files and output files can be
controlled by the user through the
filter_output_list.txt files, which can contain the user-specified list of
the ensemble input or output files, respectively.
Another view of the stages of filter is shown in the following diagram:
As shown here, an ensemble forecast is stored in
forecast.nc , to which
prior inflation can be applied and stored in
preassim.nc. Once assimilation
is applied, the output can be stored in
postassim.nc, and finally if
posterior inflation is applied, the final analysis can be written in
analysis.nc . The model forecast will start from the analysis to advance the
model in order to start the cycle over again.
The “forecast” will be the same as the “preassim” if prior inflation is not
used, and the “postassim” will be the same as the “analysis” if posterior
inflation is not used. The stages_to_write variable in the “&filter_nml”
section of the
input.nml namelist controls which stages are output to
file. For a multi-file case, the potential stages_to_write are “input,
forecast, preassim, postassim, analysis, output” while for a single file the
same stages are available with the exception of “input.”
In the above cycling diagram, there will actually be one file per member, which is not shown here in order to simplify the process.
The decision to store ensemble members as separate files and whether to run an OSSE or OSE are independent. An OSSE can be run with multiple files and an OSE can be run with all ensemble members stored in a single file.