Because CHEEREIO wraps GEOS-Chem, it requires that your computing environment has the appropriate modules loaded to compile and run GEOS-Chem. Like any model in Earth Science, this task alone can be quite challenging. See the GEOS-Chem Wiki for the hardware and software requirements. In terms of hardware requirements, you should multiply the recommended resources by roughly 32, which is a standard ensemble size. Because the ensemble is handled as a “job array”, this resource requirement will be spread out across 32 jobs that are loosely coordinated; in short, memory is spread across multiple nodes on your compute cluster. Impact on memory can be quite intense as CHEEREIO will need to load many NetCDF files into memory at once and form large matrices. By adjusting the
MaxPar setting in
ens_config.json you can limit the number of columns calculated simultaneously on one job allocation and thus the memory load. A sample environment that allows NetCDF libraries to run in both Python and GEOS-Chem is supplied by
cheereio.env in the
environments/ folder. This environment is designed for the Harvard cluster and will need to be adjusted for other machines.
Beyond the standard GEOS-Chem requirements, CHEEREIO currently requires the SLURM resource manager to handle batch submission. This is because of SLURM’s support for job arrays. CHEEREIO also requires the following modules to be installed: jq module for JSON support, GNU parallel for handling LETKF column-wise updates efficiently, and Anaconda-managed Python with the “cheereio” conda environment or equivalent installed, corresponding with the
cheereio.yaml file from the Github repository. With all these modules loaded in the software environment, CHEEREIO should run without a hitch.
Steps to install¶
CHEEREIO installation should be relatively simple if you already have installed GEOS-Chem version 13.0.0 or later on your machine. Follow the steps below:
- Clone the CHEEREIO Github repository into a permanent directory on your machine.
Install a conda environment for CHEEREIO updates from the
cheereio.yamlfile by following this guide. This file is given in the environments folder.
Depending on what observations you want to use with CHEEREIO, you might have to add a new observation operator. CHEEREIO has a very specific expected format for observation operators, which is detailed on the Workflow to add a new observation operator section. If you develop a new observation operator, I would strongly encourage you to add it on a new branch in the CHEEREIO git tree and make a pull request in the main repository. This will allow the community to make use of your operator and speed up the rate of new research.
Clone the GCClassic Github repository within the CHEEREIO folder and update the submodules. CHEEREIO requires GEOS-Chem version 13.0.0 or later.
ens_config.jsonconfiguration file according to your needs. A considerable amount of scientific thought should go into the modification, as
ens_config.jsonencodes assumptions about what species and emissions your observations will allow you to update. See Configuring your simulation for a detailed guide on how to prepare this file so you can get the best results with CHEEREIO.
- Deploy the ensemble, after reading the Configuring your simulation page to understand how this procedure works. Do so by following these steps:
trueand all other main switches set to
false. This will create a template run directory, which is almost identical to a standard GEOS-Chem run directory but with some important differences.
input.geos, for example, will have empty tags set at key locations that will allow CHEEREIO to resubmit GEOS-Chem runs for different time periods.
HEMCO_Config.rcis represented by two template files.
HEMCO_Config_SPINUP_NATURE_TEMPLATE.rcis for spinup and “nature” simulations, neither of which include randomized scaling factors.
HEMCO_Config.rcis for ensemble members, all of whom will have perturbed emissions. References to gridded scaling factors are added at key lines in this config file.
VERIFY THAT HEMCO_Config.rc IS CORRECT. Depending on the simulation you want to run, there are some subtleties that you need to check to ensure that CHEEREIO will work the way you expect it to. For more information, see the Verifying HEMCO Config after initialization page.
Make any additional changes to the template run directory that you would like to see reflected in the ensemble.
falseand all other main switches set to
true. However, if you are not using a global spinup run (and are supplying your own spun up restart file), you should set the
false. This will take a few minutes, as it involves compiling GEOS-Chem and copying and modifying large files. You can set the
falseand compile yourself first if you have custom compile-time settings you wish to invoke.
Your ensemble is now built and deployed. If you are using the “separate run” form of ensemble spinup, which is recommended for assimilation of species with longer lifetimes (and indicated by setting
ens_config.json), you can
ensemble_runsfolder and execute the
run_ensspin.shfile to execute ensemble spinup. After this completes, or if
DO_ENS_SPINUPis turned off, you can execute the
run_ens.shfile. I prefer to run both of these shell scripts with the command format
nohup bash run_ens.sh &. The SLURM job array is now submitted. For more information on how to run the ensemble, and on how to set up the two forms of ensemble spinup, see the Running the ensemble page.
While the ensemble is running, you can execute the control run simulation from the
control_runfolder by submitting the script with the name
RUNNAME_Control.runvia sbatch. This is equivalent to running GEOS-Chem without assimilation, and is useful for doing postprocessing analyses. The
control_runfolder is created by setting the