The structure of the ensemble directory

The Template Run Directory

The CHEEREIO Template Run directory is essentially a standard GEOS-Chem run directory, generated by a routine within the setup_ensemble.sh script that is very much like the code that generates the standard GEOS-Chem run directory, though there are a few minor differences. This directory will then be copied by setup_ensemble.sh to form the Spinup Run directory and the individual ensemble members within the Ensemble Runs directory. Users should make sure that the Template Run directory meets all their requirements before any further copying takes place.

Differences between the template run directory and a standard run directory are as follows. input.geos has empty tags for the start and end times, as this will allow CHEEREIO to resubmit GEOS-Chem runs for different time periods. HEMCO_Config.rc is represented by two template files. HEMCO_Config_SPINUP_NATURE_TEMPLATE.rc is for spinup and “nature” simulations, neither of which include randomized scaling factors (i.e. it is a normal HEMCO_Config.rc file generated by the usual run directory creation script). HEMCO_Config.rc is for ensemble members, all of whom will have perturbed emissions. References to gridded scaling factors are added at key lines in this config file. Finally, the template run directory has no batch script associated with it, as it cannot be run.

Although HEMCO_Config.rc is generated by the setup_ensemble.sh utility, it may not be automatically ready to use if you are distinguishing your updates of different emissions sources of one species (for example, updating NO agricultural emissions separately from the rest of NO emissions). CHEEREIO is not capable of distinguishing these emissions on its own. Instead, it will just add scaling factor references wherever the species of interest emerges. The user must delete the correct duplicated scaling factor references manually.

Users should pay attention to collections saved in HISTORY.rc and modify if desired. However, if using StateMet or LevelEdgeDiags collections during assimilation (e.g. for some TROPOMI assimilation runs), make sure to turn those on using the switches in ens_config.json rather than a direct modification of HISTORY.rc. This is because CHEEREIO modifies HISTORY.rc on the fly in some situations.

Aside from these subtleties, the user should modify the template run directory as freely as they would any GEOS-Chem run directory they are customizing for their own research.

The Spinup Run Directory

The Spinup Run directory, if it is enabled, functions like a normal GEOS-Chem run directory. Created by the setup_ensemble.sh, it comes with a run script and with all the configuration files set according to your specifications in ens_config.json. When the Spinup Run terminates, the restart file generated will automatically be used to initialize the ensemble run directories. No copying on the user’s part is necessary.

The Scratch Directory

Although the user should never modify anything in the scratch directory, it may still be useful to know how CHEEREIO makes use of this folder throughout run time. There are three main types of file in the scratch directory:

  • Column files (.npy): Column files contain assimilated columns which will eventually be combined and used to update ensemble restarts and scaling factors. Each core on each run instance calculates some number of columns at assimilation time and saves them to the scratch directory in a relevant subfolder, until finally all are computed and can be used to adjust the ensemble.

  • Internal state files: these files track things like the current date, lat/lon coordinates, and columns assigned to each core in the ensemble parallelization routine.

  • Flag files: these files are used to couple the many jobs that are running simultaneously during a CHEEREIO assimilation routine. They track ensemble members as they finish GEOS-Chem, as columns are being saved, and as assimilation and clean up processes complete. If an ensemble member fails, it can generate a kill file that terminates the entire ensemble, saving computational resources.

The only reason to ever view the scratch directory is in the event of ensemble failure. In this case, the KILL_ENS file may contain a short error message that can help the user identify the most relevant log file for debugging.

The Ensemble Runs Directory

The Ensemble Runs directory is created in two stages: ensemble run scripts are created when setup_ensemble.sh creates the Template Run directory, while the individual ensemble run directories are created when SetupEnsembleRuns is set to true after the Template Run directory has been created and (optionally) edited by the user. Contents of the completely created Ensemble Runs Directory are as follows:

  • The run_ensemble_simulations.sh bash script is a very complex batch submission script that manages the starting and stopping of a single GEOS-Chem ensemble member run, executes the subset of the LETKF operation that is assigned to this ensemble member (including coordinating internal core-wise parallelization), and, for the “master” ensemble member (always ensemble member 1), coordinates the overall ensemble (e.g. file clean-up, resynchronization, restart and scaling updates). More details are available in the About the Run Ensemble Simulations script entry. The user never executes this script directly.

  • The run_ens.sh bash script contains very simple instructions on how to submit a job array of ensemble member simulations (i.e. instances of run_ensemble_simulations.sh) to the SLURM scheduler. We recommend this script be executed via nohup bash run_ens.sh &. After this command is given, the ensemble will run until completion.

  • The log folder contains the vast number of log files produced by the ensemble as it runs. The only exception is GEOS-Chem log files, which are contained in the individual ensemble run directories. There are four types of files in the log folder:

    • ensemble_slurm_JOBNUMBER.err files. One such file is present for each ensemble member. These contain errors returned to the program on the shell-level. If all goes well, this will be empty. Otherwise, they can be very useful in determining what went wrong at runtime.

    • ensemble_slurm_JOBNUMBER.out files. One such file is present for each ensemble member. These contain regular output returned to the program on the shell-level. These won’t have much in them and are rarely worth looking at.

    • letkf_ENSNUMBER_CORENUMBER.out files. One file is present for each core assigned columns to assimilate within each ensemble member. They contain real-time information about what this particular core is doing at assimilation time (including overall time taken to load files and compute assimilated columns).

    • The letkf_master.out file. Only one of these files is created by ensemble member 1, which is by default the “job manager” (coordinates ensemble members, does file clean-up and NetCDF updates, etc.). Like the other LETKF log files, this contains real-time information about the combination of assimilated columns and the updates of NetCDF files.

  • The ensemble run directory folders, each with name SimulationName_FourDigitEnsembleMemberID. These are standard GEOS-Chem run directories, copied from the Template Run Directory. The only difference between these ensemble members and other run directories are that these lack individual run scripts. In addition, HEMCO_Config.rc is linked to NetCDF files containing gridded scaling factors which are updated at assimilation time. Unique instances of these scaling factors are present in each of these folders and have names of form *_SCALEFACTOR.nc.