# Data assimilation in DART using the Lorenz 63 model

In this section we open the “black box” of the Lorenz model that was previously used in Compiling DART. This section assumes you have successfully run the Lorenz 63 model with the example observation files that were distributed with the DART repository. In this section you will learn in more detail how DART interacts with the Lorenz 63 model to perform data assimilation.

## The input.nml namelist

The `DART/models/lorenz_63/work/input.nml`

file is the Lorenz model
*namelist*, which is a standard Fortran method for passing parameters from a
text file into a program without needing to recompile. There are many sections
within this file that drive the behavior of DART while using the Lorenz 63 model
for assimilation. Within `input.nml`

, there is a section called *model_nml*,
which contains the model-specific parameters:

```
&model_nml
sigma = 10.0,
r = 28.0,
b = 2.6666666666667,
deltat = 0.01,
time_step_days = 0,
time_step_seconds = 3600,
solver = 'RK2'
/
```

Here, you can see the values for the parameters `sigma`

, `r`

, and `b`

that
were discussed in the previous section. These are the original values Lorenz
used in the 1963 paper to create the classic butterfly attractor.

## The Lorenz 63 model code

The Lorenz 63 model code, which is under
`DART/models/lorenz_63/model_mod.f90`

, contains the lines:

```
subroutine comp_dt(x, dt)
real(r8), intent( in) :: x(:)
real(r8), intent(out) :: dt(:)
! compute the lorenz model dt from standard equations
dt(1) = sigma * (x(2) - x(1))
dt(2) = -x(1)*x(3) + r*x(1) - x(2)
dt(3) = x(1)*x(2) - b*x(3)
end subroutine comp_dt
```

which directly translates the above ODE into Fortran.

Note that the routine `comp_dt`

does not explicitly depend on the time
variable, only on the state variables (i.e. the Lorenz 63 model is time
invariant).

Note

By default, the `model_mod.f90`

follows the Lorenz 63 paper to use the
Runge-Kutta 2 scheme (otherwise known as RK2 or the midpoint scheme) to
advance the model.

Since the Lorenz 63 model is time invariant, the RK2 code to advance the ODE in
time can be written as follows, again following the Lorenz 63 paper, for a
`fract`

fraction of a time-step (typically equal to 1):

```
!------------------------------------------------------------------
!> does single time step advance for lorenz convective 3 variable model
!> using two step rk time step
subroutine adv_single(x, fract)
real(r8), intent(inout) :: x(:)
real(r8), intent(in) :: fract
real(r8) :: x1(3), x2(3), dx(3)
call comp_dt(x, dx) ! compute the first intermediate step
x1 = x + fract * deltat * dx
call comp_dt(x1, dx) ! compute the second intermediate step
x2 = x1 + fract * deltat * dx
! new value for x is average of original value and second intermediate
x = (x + x2) / 2.0_r8
end subroutine adv_single
```

Together, these two code blocks describe how the Lorenz 63 model is advanced in time. You will see how DART uses this functionality shortly.

## The model time step and length of the data assimilation

In the original Lorenz 63 paper, the model is run for 50 “days” using a non-dimensional time-step of 0.01, which is reproduced in the namelist above. This time-step was assumed equal to 3600 seconds, or one hour, in dimensional time. This is also set in the namelist above. The Lorenz 63 model observation file included with the DART repository uses observations of all three state variables every six hours (so every six model steps) to conduct the assimilation.

If you were previously able to run the Matlab diagnostic scripts, you may have noticed that the butterfly attractor for the included example does not look as smooth as might be desired:

This is because the model output was only saved once every six “hours” at the observation times. As an exercise, let’s make a nicer-looking plot using the computational power available today, which even on the most humble of computers is many times greater than what Lorenz had in 1963. Let’s change Lorenz’s classic experiment to the following:

Make the non-dimensional timestep 0.001, a factor of 10 smaller, which will correspond to a dimensional timestep of 360 seconds (6 minutes). This smaller time-step will lead to a smoother model trajectory.

Keep the original ratio of time steps to observations included in the DART repository of assimilating observations every six time steps, meaning we now need observations every 36 minutes.

Therefore, in order to conduct our new experiment, we will need to regenerate the DART observation sequence files.

To change the time-step, change the `input.nml`

file in
`DART/models/lorenz_63/work`

to the following:

```
&model_nml
sigma = 10.0,
r = 28.0,
b = 2.6666666666667,
deltat = 0.001,
time_step_days = 0,
time_step_seconds = 360
/
```

Note

The changes are to `deltat`

and `time_step_seconds`

. Additionally:
you do not need to recompile the DART code as the purpose of namelist files
is to pass run-time parameters to a Fortran program without recompilation.

## Updating the observation sequence

Let’s now regenerate the DART observation files with the updated timestep and
observation ratio. In a typical large-scale application, the user will provide
observations to DART in a standardized format called the *Observation Sequence*
file. Since there are no real observations of the Lorenz 63 system, we must
create our own synthetic observations - which may be done using
*create_obs_sequence*, *create_fixed_network_seq*, and *perfect_model_obs*
programs; each of which we will explain below. These helpful interactive
programs are included with DART to generate these observation sequence files for
typical research or education-oriented experiments. In such setups, observations
(with noise added) will be generated at regular intervals from a model “truth”.
This “truth” will only be available to the experiment through the noisy
observations but can later be used for comparison purposes. The number of steps
necessary for the ensemble members to reach the true model state’s “attractor”
can be investigated and, for example, compared between different DA methods.
This is an example of an “OSSE” — see High-level data assimilation workflows in DART for more
information.

The three programs used in this example to create an observation sequence again
are *create_obs_sequence*, *create_fixed_network_seq*, and *perfect_model_obs*.
*create_obs_sequence* creates a template for the observations,
*create_fixed_network_seq* repeats that template at multiple times, and finally
*perfect_model_obs* harvests the observation values. These programs have many
additional capabilities; if interested, see the corresponding program’s
documentation.

Let’s now run the DART program *create_obs_sequence* to create the observation
template that we will later replicate in time:

# Make sure you are in the DART/models/lorenz_63/work directory ./create_obs_sequence

The program *create_obs_sequence* will ask for the number of observations. Since
we plan to have 3 observations at each time step (one for each of the state
variables), input **3**:

```
set_nml_output Echo NML values to log file only
--------------------------------------------------------
-------------- ASSIMILATE_THESE_OBS_TYPES --------------
RAW_STATE_VARIABLE
--------------------------------------------------------
-------------- EVALUATE_THESE_OBS_TYPES --------------
none
--------------------------------------------------------
---------- USE_PRECOMPUTED_FO_OBS_TYPES --------------
none
--------------------------------------------------------
Input upper bound on number of observations in sequence
3
```

For this experimental setup, we will not have any additional copies of the data,
nor will we have any quality control fields. So use **0** for both.

```
Input number of copies of data (0 for just a definition)
0
Input number of quality control values per field (0 or greater)
0
```

We now will setup each of the three observations. The program asks to enter -1
if there are no additional observations, so input anything else instead (**1**
below). Then enter **-1**, **-2**, and **-3** in sequence for the state variable
index (the observation here is just the values of the state variable). Use **0
0** for the time (we will setup a regularly repeating observation after we
finish this), and **8** for the error variance for each observation.

Finally, after inputting press enter to use the default output file
`set_def.out`

.

Input your values as follows:

```
input a -1 if there are no more obs
1
Input -1 * state variable index for identity observations
OR input the name of the observation kind from table below:
OR input the integer index, BUT see documentation...
1 RAW_STATE_VARIABLE
-1
input time in days and seconds (as integers)
0 0
Input the error variance for this observation definition
8
input a -1 if there are no more obs
1
Input -1 * state variable index for identity observations
OR input the name of the observation kind from table below:
OR input the integer index, BUT see documentation...
1 RAW_STATE_VARIABLE
-2
input time in days and seconds (as integers)
0 0
Input the error variance for this observation definition
8
input a -1 if there are no more obs
1
Input -1 * state variable index for identity observations
OR input the name of the observation kind from table below:
OR input the integer index, BUT see documentation...
1 RAW_STATE_VARIABLE
-3
input time in days and seconds (as integers)
0 0
Input the error variance for this observation definition
8
Input filename for sequence (<return> for set_def.out )
write_obs_seq opening formatted observation sequence file "set_def.out"
write_obs_seq closed observation sequence file "set_def.out"
create_obs_sequence Finished successfully.
```

## Creating a regular sequence of observations

We will now utilize another DART program that takes this `set_def.out`

file as
input. The interactive program `create_fixed_network_seq`

is a helper tool
that can be used to generate a DART observation sequence file made of a set of
regularly repeating observations.

# Make sure you are in the DART/models/lorenz_63/work directory ./create_fixed_network_seq

We want to use the default `set_def.out`

file, so press return. We also want a
regularly repeating time sequence, so input **1**.

```
set_nml_output Echo NML values to log file only
--------------------------------------------------------
-------------- ASSIMILATE_THESE_OBS_TYPES --------------
RAW_STATE_VARIABLE
--------------------------------------------------------
-------------- EVALUATE_THESE_OBS_TYPES --------------
none
--------------------------------------------------------
---------- USE_PRECOMPUTED_FO_OBS_TYPES --------------
none
--------------------------------------------------------
Input filename for network definition sequence (<return> for set_def.out )
To input a regularly repeating time sequence enter 1
To enter an irregular list of times enter 2
1
```

We now will input the number of observations in the file. The purpose of this exercise is to refine the time step used by Lorenz in 1963 by a factor of 10. Since we want to keep the ratio of six model steps per observation and run for 50 days, we will need 2000 model observations (360 seconds × 6 × 2000 = 50 days).

As we specified in `set_def.out`

, there are 3 observations per time step,
so a total of 6000 observations will be generated.

Note

The Lorenz 63 model dimensional time-step is related to the observational
time *only* through this mechanism. In other words, `deltat`

in the
namelist could relate to virtually any dimensional time step
`time_step_seconds`

if the observation times were not considered. However,
DART will automatically advance the model state to the observation times in
order to conduct the data assimilation at the appropriate time, then repeat
this process until no additional observations are available, thus indirectly
linking `deltat`

to `time_step_seconds`

.

Enter **2000** for the number of observation times. The initial time will be **0
0**, and the input period will be **0** days and **2160** seconds (36 minutes).

```
Input number of observation times in sequence
2000
Input initial time in sequence
input time in days and seconds (as integers)
0 0
Input period of obs in sequence in days and seconds
0 2160
```

The numbers 1 to 2000 will then be output by `create_fixed_network_seq`

. Press
return to accept the default output name of `obs_seq.in`

. The file suffix is
`.in`

as this will be the input to the next program, *perfect_model_obs*.

```
1
2
...
1998
1999
2000
What is output file name for sequence (<return> for obs_seq.in)
write_obs_seq opening formatted observation sequence file "obs_seq.in"
write_obs_seq closed observation sequence file "obs_seq.in"
create_fixed_network_seq Finished successfully.
```

## Running perfect_model_obs

We are now ready to run *perfect_model_obs*, which will read in `obs_seq.in`

and generate the observations as well as create the “perfect” model trajectory.
“Perfect” here is a synonym for the known “true” state which is used to generate
the observations. Once noise is added (to represent observational uncertainty),
the output is written to `obs_seq.out`

.

# Make sure you are in the DART/models/lorenz_63/work directory./perfect_model_obs

The output should look like the following:

```
set_nml_output Echo NML values to log file only
initialize_mpi_utilities: Running single process
--------------------------------------------------------
-------------- ASSIMILATE_THESE_OBS_TYPES --------------
RAW_STATE_VARIABLE
--------------------------------------------------------
-------------- EVALUATE_THESE_OBS_TYPES --------------
none
--------------------------------------------------------
---------- USE_PRECOMPUTED_FO_OBS_TYPES --------------
none
--------------------------------------------------------
quality_control_mod: Will reject obs with Data QC larger than 3
quality_control_mod: No observation outlier threshold rejection will be done
perfect_main Model size = 3
perfect_read_restart: reading input state from file
perfect_main total number of obs in sequence is 6000
perfect_main number of qc values is 1
perfect_model_obs: Main evaluation loop, starting iteration 0
move_ahead Next assimilation window starts at: day= 0 sec= 0
move_ahead Next assimilation window ends at: day= 0 sec= 180
perfect_model_obs: Model does not need to run; data already at required time
perfect_model_obs: Ready to evaluate up to 3 observations
perfect_model_obs: Main evaluation loop, starting iteration 1
move_ahead Next assimilation window starts at: day= 0 sec= 1981
move_ahead Next assimilation window ends at: day= 0 sec= 2340
perfect_model_obs: Ready to run model to advance data ahead in time
perfect_model_obs: Ready to evaluate up to 3 observations
...
perfect_model_obs: Main evaluation loop, starting iteration 1999
move_ahead Next assimilation window starts at: day= 49 sec= 84061
move_ahead Next assimilation window ends at: day= 49 sec= 84420
perfect_model_obs: Ready to run model to advance data ahead in time
perfect_model_obs: Ready to evaluate up to 3 observations
perfect_model_obs: Main evaluation loop, starting iteration 2000
perfect_model_obs: No more obs to evaluate, exiting main loop
perfect_model_obs: End of main evaluation loop, starting cleanup
write_obs_seq opening formatted observation sequence file "obs_seq.out"
write_obs_seq closed observation sequence file "obs_seq.out"
```

You can now see the files `true_state.nc`

, a netCDF file which has the perfect
model state at all 2000 observation times; `obs_seq.out`

, an ASCII file which
contains the 6000 observations (2000 times with 3 observations each) of the true
model state with noise added in; and `perfect_output.nc`

, a netCDF file with
the final true state that could be used to “restart” the experiment from the
final time (49.75 days in this case).

We can now see the relationship between `obs_seq.in`

and `obs_seq.out`

:
`obs_seq.in`

contains a “template” of the desired observation locations and
types, while `obs_seq.out`

is a list of the actual observation values, in this
case generated by the *perfect_model_obs* program.

Important

`create_obs_seq`

is used for this low-order model because there are no
real observations for Lorenz 63. For systems that have real observations,
DART provides a variety of *observation converters* available to convert
from native observation formats to the DART format. See
Available observation converter programs for a list.

## Running the filter

Now that `obs_seq.out`

and `true_state.nc`

have been prepared, DART can
perform the actual data assimilation. This will generate an ensemble of model
states, use the ensemble to estimate the prior distribution, compare to the
“expected” observation of each member, and update the model state according to
Bayes’ rule.

# Make sure you are in the DART/models/lorenz_63/work directory ./filter

```
set_nml_output Echo NML values to log file only
initialize_mpi_utilities: Running single process
--------------------------------------------------------
-------------- ASSIMILATE_THESE_OBS_TYPES --------------
RAW_STATE_VARIABLE
--------------------------------------------------------
-------------- EVALUATE_THESE_OBS_TYPES --------------
none
--------------------------------------------------------
---------- USE_PRECOMPUTED_FO_OBS_TYPES --------------
none
--------------------------------------------------------
quality_control_mod: Will reject obs with Data QC larger than 3
quality_control_mod: No observation outlier threshold rejection will be done
assim_tools_init: Selected filter type is Ensemble Adjustment Kalman Filter (EAKF)
assim_tools_init: The cutoff namelist value is 1000000.000000
assim_tools_init: ... cutoff is the localization half-width parameter,
assim_tools_init: ... so the effective localization radius is 2000000.000000
filter_main: running with an ensemble size of 20
parse_stages_to_write: filter will write stage : preassim
parse_stages_to_write: filter will write stage : analysis
parse_stages_to_write: filter will write stage : output
set_member_file_metadata no file list given for stage "preassim" so using default names
set_member_file_metadata no file list given for stage "analysis" so using default names
Prior inflation: None
Posterior inflation: None
filter_main: Reading in initial condition/restart data for all ensemble members from file(s)
filter: Main assimilation loop, starting iteration 0
move_ahead Next assimilation window starts at: day= 0 sec= 0
move_ahead Next assimilation window ends at: day= 0 sec= 180
filter: Model does not need to run; data already at required time
filter: Ready to assimilate up to 3 observations
comp_cov_factor: Standard Gaspari Cohn localization selected
filter_assim: Processed 3 total observations
filter: Main assimilation loop, starting iteration 1
move_ahead Next assimilation window starts at: day= 0 sec= 21421
move_ahead Next assimilation window ends at: day= 0 sec= 21780
filter: Ready to run model to advance data ahead in time
filter: Ready to assimilate up to 3 observations
filter_assim: Processed 3 total observations
...
filter: Main assimilation loop, starting iteration 199
move_ahead Next assimilation window starts at: day= 49 sec= 64621
move_ahead Next assimilation window ends at: day= 49 sec= 64980
filter: Ready to run model to advance data ahead in time
filter: Ready to assimilate up to 3 observations
filter_assim: Processed 3 total observations
filter: Main assimilation loop, starting iteration 200
filter: No more obs to assimilate, exiting main loop
filter: End of main filter assimilation loop, starting cleanup
write_obs_seq opening formatted observation sequence file "obs_seq.final"
write_obs_seq closed observation sequence file "obs_seq.final"
```

Based on the default Lorenz 63 `input.nml`

namelist for *filter* included in
the DART repository, the assimilation will have three stages:

The

*preassim*stage, where the ensemble is updated by advancing the model. The file`preassim.nc`

, which contains the pre-assimilation model trajectories for all the ensemble members, will be written.The

*analysis*stage, where the data assimilation is conducted. The post-assimilation model trajectories for all the ensemble members will be written to`analysis.nc`

The

*output*stage, which writes the file`obs_seq.final`

containing the actual observations as assimilated plus the ensemble forward-operator expected values and any quality-control values. This stage also writes the`filter_output.nc`

file containing the ensemble state from the final cycle, which could be used to restart the experiment.

DART has now successfully assimilated our updated observations with a 6 minute
model time step and assimilation every 36 minutes. *:tada:*

## Verifying the nicer-looking results

You can now run the verification scripts (as in the section Verifying installation) in Matlab with the following commands:

>> addpath ../../../diagnostics/matlab>> plot_ens_time_series

Some additional commands to view the attractor from the ZY plane were used:

>> set(findall(gca, ‘Type’, ‘Line’),‘LineWidth’,2);>> set(gca,‘FontSize’,18)>> xlabel(‘x’)>> ylabel(‘y’)>> zlabel(‘z’)>> view([90 0])

We can now see the following smooth Lorenz 63 true state and ensemble mean comparison with a 6 minute model time step and assimilation every 36 minutes:

As you can see, the ensemble mean in red matches the true state almost exactly, although it took a number of assimilation cycles before the blue ensemble mean was able to reach the red true state “attractor.”

You should now be able to tinker with the Lorenz 63 model and other models in DART. For more detailed information on the theory of ensemble data assimilation, see the DART Tutorial. For more concrete information regarding DART’s algorithms and capabilities, see the next section The benefits of using DART. To add your own model to DART, see Assimilation in a complex model. Finally, if you want to add your own observations to DART, see Adding your observations to DART.