Overview of CHEEREIO’s capabilities

What CHEEREIO can do with 4D-LETKF

CHEEREIO is a tool for chemical data assimilation characterized by its flexibility. It can assimilate any kind of observation (satellite, surface, or aircraft) for any species in any configuration of GEOS-Chem, applying updates to both emissions scaling factors and chemical concentrations. CHEEREIO allows for wide flexibility in what can be assimilated, and allows for the user to (1) update observations at multiple time points in the assimilation window (e.g. hourly data updates for a daily window), and (2) assimilate observations of derived quantities such as PM2.5 or AOD. This is because CHEEREIO needs to load history files to gather concentrations, rather than just restart files which are sufficient for instantaneous (“3D”) assimilation.

A key feature of CHEEREIO is its distinction between “control” and “state” vectors, following work by Kazuyuki Miyazaki. The state vector should consist of all concentrations relevant to the problem at hand as well as the emissions of interest (e.g. large chemical families). The control vector should be a subset of the state vector, and represents concentrations and the same emissions of interest that the user believes can reasonably be updated on the basis of observations. This likely will be a set of species directly coupled to the observed species. Although the entire state vector is used to calculate the concentration and emissions update, only the control vector is actually updated. In practice, this distinction helps tamp down on noise and create well-behaved assimilations.

Many different kinds of observations can be used at once to update a family of species. For example, one might want to use NO2 satellite and surface data, SO2 satellite and surface data, NH3 satellite data, AOD, and surface PM2.5 to update emissions and concentrations of NO, NO2, SO2, and NH3 along with concentrations of SO4, NOy, and NH4. Whether applying this sort of update in a small nested region or for a global simulation, CHEEREIO can be configured to support just about any kind of chemical data assimilation problem.

What CHEEREIO can’t do

The 4D-LETKF algorithm that undergirds CHEEREIO is complex (see Further reading on the About page for more details), and its application to chemical data assimilation remains experimental. You should approach this tool with caution and evaluate your results carefully. CHEEREIO is a purely statistical tool and will not warn you if it is merely assimilating noise. For example, you could configure CHEEREIO to use observations of SO2 to update emissions of isoprene thousands of kilometers away. Such an observation will provide little information, and any updates will likely be a result of statistical noise that may still look like a signal in some cases. In short, spend lots of time thinking about which control and state vector settings will be most informative for your particular problem. CHEEREIO cannot tell you if your settings are reasonable, nor can it provide sensible updates emissions or concentrations of species unilluminated by observations.

Postprocessing tools

CHEEREIO comes with a suite of postprocessing tools and pre-built workflows in the postprocess/ folder of the main CHEEREIO code directory. In particular, the SLURM batch script postprocess_prep.batch will automatically create a variety of figures, movies, and consolidated data files that the user can then view and modify. The file controlvar_pp.nc created in the postprocess workflow contains control vector concentrations across the ensemble, consolidated into a NetCDF file with dimensions ensemble, time, level, latitude, and longitude. bigY.pkl is a pickled Python dictionary which contains, for each species in the control vector, the observations and corresponding ensemble “simulated observations” along with metadata like latitude, longitude, and time of each observation. An example of a simulated observation is a GEOS-Chem column with a satellite averaging kernel applied. Finally, {EMIS_NAME}_SCALEFACTOR.nc consolidates the emissions scaling factors across the ensemble matching the user-specified label {EMIS_NAME}, concatenating them along a new dimension labeled “ensemble.” To supplement these auto-generated files and figures, more useful functions are included in the postprocess_tools.py file.