Existing observation operators

CHEEREIO ships with observation operators that have produced by the community. Users can add their own operators by following the instruction on the Workflow to add a new observation operator page, or they can use the operators listed below.

TROPOMI tools

The TROPOspheric Monitoring Instrument (TROPOMI) onboard Sentinel-5 Precursor satellite satellite measures criteria air pollutants and other trace gases of interest to CHEEREIO users. Currently, there are three TROPOMI operators written for CHEEREIO: CH4, CO (the latter contributed by Sina Voshtani), and NO2. Users can follow the pattern in tropomi_tools.py to add support for additional species.

To activate TROPOMI observations, list “TROPOMI” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.

TROPOMI fields available to save and plot in CHEEREIO

All observation operators save standard data from observations and GEOS-Chem and pass it on to CHEEREIO (including latitude, longitude, observation values, GC simulated observation values, and timestamps). However, individual operators might also be able to save additional data, such as albedo in remote sensing cases or site IDs in surface observation cases. Users can list the additional data they would like to save in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. With the TROPOMI operator, all data read in by the read_tropomi function are available to be saved and/or plotted. Below is a subset of the supported fields which users can save in TROPOMI:

qa_value

Quality assurance value.

column_AK

Satellite averaging kernel.

albedo_swir

Shortwave infrared albedo.

albedo_nir

Near infrared albedo.

blended_albedo

Blended albedo.

methane_profile_apriori

A priori methane profile. (Methane only)

TROPOMI operator support functions

The TROPOMI observation operator calls a handful of utility functions to process observations from file and format them in such a way that the gcCompare method of the TROPOMI_Translator class can perform relevant computations.

The first two utility functions are designed to remap GEOS-Chem pressure levels to satellite pressure levels and apply the averaging kernel.

GC_to_sat_levels(GC_SPC, GC_edges, sat_edges, species, chunk_size=10000)

Takes as input GEOS-Chem data and pressure level edges, as well as satellite pressure levels, and calculate GEOS-Chem data values on satellite pressure levels

Parameters
  • GC_SPC (array) – A NumPy array containing GEOS-Chem columns.

  • GC_edges (array) – A NumPy array containing GEOS-Chem pressure level edges.

  • sat_edges (array) – A NumPy array containing TROPOMI pressure level edges

  • species (str) – Species to be processed.

  • chunk_size (int) – For CO, the number of observations to be processed at once. This is to save memory for TROPOMI observations with high vertical resolution.

Returns

A NumPy array containing GEOS-Chem columns remapped to be on the TROPOMI pressure levels.

Return type

array

apply_avker(sat_avker, sat_pressure_weight, GC_SPC, sat_prior=None, filt=None)

Apply the averaging kernel

Parameters
  • sat_avker (array) – TROPOMI averaging kernel.

  • sat_prior (array) – The satellite prior profile in ppb, optional (used for CH4).

  • sat_pressure_weight (array) – The relative pressure weights for each level

  • GC_SPC (array) – The GC species on the satellite levels, output by GC_to_sat_levels

  • filt (array) – A filter, optional

Returns

A NumPy array containing simulated GEOS-Chem values such that they are directly comparable to TROPOMI (i.e. because the averaging kernel has been applied).

Return type

array

The remaining three utility functions are very similar. They read TROPOMI level 2 observations from file, but for methane there are a variety of TROPOMI observations with a variety of formattings (operational, science product, Harvard-specific standard). Each function is designed to read a different formatting. Users can select which function they would like to use by specifying the WHICH_TROPOMI_PRODUCT setting in the TROPOMI_CH4_extension.json file. DEFAULT selects the TROPOMI operational product, ACMG for the ACMG/Harvard TROPOMI product, and BLENDED for Belasus et al., 2023 which also works for the TROPOMI science product. For CO, users must select the DEFAULT option.

read_tropomi(filename, species, filterinfo=None, includeObsError=False)

Designed for the TROPOMI operational level 2 product. A utility function which loads TROPOMI observations from file, filters them, and returns a dictionary of important data formatted for input into the gcCompare method of the TROPOMI_Translator class. This function is selected when users supply the DEFAULT value to the WHICH_TROPOMI_PRODUCT setting in the TROPOMI_CH4_extension.json file.

Parameters
  • filename (str) – NetCDF file containing TROPOMI observations to be loaded. Expects standard level 2 data.

  • species (str) – Name of species to be loaded. Currently, only “CH4” is supported, though an “NO2” operator is partially written.

  • filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information

  • includeObsError (bool) – True or False, read the errors associated with individual observations.

Returns

A dictionary containing observation values and metadata, ready for input into the gcCompare method of the TROPOMI_Translator class.

Return type

dict

read_tropomi_acmg(filename, species, filterinfo=None, includeObsError=False)

Designed for the Harvard/ACMG version of the TROPOMI operational level 2 product. A utility function which loads TROPOMI observations from file, filters them, and returns a dictionary of important data formatted for input into the gcCompare method of the TROPOMI_Translator class. This function is selected when users supply the ACMG value to the WHICH_TROPOMI_PRODUCT setting in the TROPOMI_CH4_extension.json file.

Parameters
  • filename (str) – NetCDF file containing TROPOMI observations to be loaded. Expects standard level 2 data.

  • species (str) – Name of species to be loaded. Currently, only “CH4” is supported, though an “NO2” operator is partially written.

  • filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information

  • includeObsError (bool) – True or False, read the errors associated with individual observations.

Returns

A dictionary containing observation values and metadata, ready for input into the gcCompare method of the TROPOMI_Translator class.

Return type

dict

read_tropomi_gosat_corrected(filename, species, filterinfo=None, includeObsError=False)

Designed for the Belasus et al., 2023 version of the TROPOMI operational level 2 product, but also works for the TROPOMI science product. A utility function which loads TROPOMI observations from file, filters them, and returns a dictionary of important data formatted for input into the gcCompare method of the TROPOMI_Translator class. This function is selected when users supply the BLENDED value to the WHICH_TROPOMI_PRODUCT setting in the TROPOMI_CH4_extension.json file.

Parameters
  • filename (str) – NetCDF file containing TROPOMI observations to be loaded. Expects standard level 2 data.

  • species (str) – Name of species to be loaded. Currently, only “CH4” is supported, though an “NO2” operator is partially written.

  • filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information

  • includeObsError (bool) – True or False, read the errors associated with individual observations.

Returns

A dictionary containing observation values and metadata, ready for input into the gcCompare method of the TROPOMI_Translator class.

Return type

dict

OMI tools

NASA’s Ozone Monitoring Instrument (OMI) onboard the Aura satellite measures criteria air pollutants and other trace gases of interest to CHEEREIO users. Currently, NO2is the only OMI operator written for CHEEREIO, but users can follow the pattern in omi_tools.py to add support for additional species.

To activate OMI observations, list “OMI” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.

OMI fields available to save and plot in CHEEREIO

All observation operators save standard data from observations and GEOS-Chem and pass it on to CHEEREIO (including latitude, longitude, observation values, GC simulated observation values, and timestamps). However, individual operators might also be able to save additional data, such as albedo in remote sensing cases or site IDs in surface observation cases. Users can list the additional data they would like to save in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page.

Currently, no additional ObsData fields are available to be saved and/or plotted with the OMI operator as written. Additional fields can be added by saving more metadata into the ObsData object with the addData function within the gcCompare method of the OMI_Translator class. See the Observations page for more details.

OMI operator support functions

The OMI observation operator calls two utility functions to process observations from file and format them in such a way that the gcCompare method of the OMI_Translator class can perform relevant computations. They are documented below:

read_omi(filename, species, filterinfo=None, includeObsError=False)

A utility function which loads OMI observations from file, filters them, and returns a dictionary of important data formatted for input into the gcCompare method of the OMI_Translator class.

Parameters
  • filename (str) – NetCDF file containing OMI observations to be loaded. Expects standard level 2 data.

  • species (str) – Name of species to be loaded. Currently, only “NO2” is supported.

  • filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information

  • includeObsError (bool) – True or False, read the errors associated with individual observations.

Returns

A dictionary containing observation values and metadata, ready for input into the gcCompare method of the OMI_Translator class.

Return type

dict

clearEdgesFilterByQAAndFlatten(met)

A utility function takes in partially formatted OMI data and does additional processing, outputting a flattened set of arrays which are compatible with CHEEREIO. In the process, the function removes swath edges and bad retrieval values.

Parameters

met (dict) – A dictionary with keys naming important observation data and metadata, and values of raw 2D swath data from OMI.

Returns

A dictionary containing flattened observation values and metadata, with bad data removed, ready for input into the gcCompare` method of the OMI_Translator class.

Return type

dict

IASI tools

The Infrared Atmospheric Sounding Interferometer (IASI) instrument measures trace gases of interest to CHEEREIO users. Currently, NH3 (as retrieved by the ANNI v4 algorithm from the Université Libre de Bruxelles group) is the only IASI operator written for CHEEREIO, but users can follow the pattern in iasi_tools.py to add support for additional species.

To activate IASI observations, list “IASI” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.

IASI fields available to save and plot in CHEEREIO

Users can list additional fields (beyond the minimum observations and spatiotemporal location) they would like to save from IASI in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. With the IASI operator, all data read in by the read_iasi function are available to be saved and/or plotted.

Note that the IASI operator calculates whether an observation should be discarded after the operator is applied, as we can only evaluate whether the observation is in appropriate error bounds recommended by ULB once we replace the prior column with the GC column. This is implemented through the postfilter functionality, as described on the (7) [optional] Add ability to filter observations after the operator is applied entry.

IASI operator support functions

The IASI observation operator calls two utility functions to process observations from file and format them in such a way that the gcCompare method of the IASI_Translator class can perform relevant computations. They are documented below:

read_iasi(filename, species, filterinfo=None, includeObsError=False)

A utility function which loads IASI observations from file, filters them, and returns a dictionary of important data formatted for input into the gcCompare method of the IASI_Translator class.

Parameters
  • filename (str) – NetCDF file containing IASI observations to be loaded. Expects standard level 2 data.

  • species (str) – Name of species to be loaded. Currently, only “NH3” is supported.

  • filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information

  • includeObsError (bool) – True or False, read the errors associated with individual observations.

Returns

A dictionary containing observation values and metadata, ready for input into the gcCompare method of the IASI_Translator class.

Return type

dict

GC_to_sat_levels(GC_SPC, GC_bxheight, sat_edges)

See the function from TROPOMI_tools. Note that for the IASI product we only have height from surface, so the function uses GC Boxheight diagnostic to do a pressure level regrid approximation.

TCCON tools

The Total Carbon Column Observing Network (TCCON) measures trace gases of interest to CHEEREIO users. Currently, CO and nitrous oxide are supported for CHEEREIO, but users can follow the pattern in tccon_tools.py to add support for additional species. To activate TCCON observations, list “TCCON” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.

The TCCON operator was originally built by Sina Voshtani. See CHEEREIO papers to cite for a recommended citation.

TCCON fields available to save and plot in CHEEREIO

Users can list additional fields (beyond the minimum observations and spatiotemporal location) they would like to save from TCCON in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. With the TCCON operator, all data read in by the read_tccon function are available to be saved and/or plotted.

TCCON operator support functions

The TCCON observation operator calls two utility functions to process observations from file and format them in such a way that the gcCompare method of the TCCON_Translator class can perform relevant computations. They are documented below:

read_tccon(filename, species, filterinfo=None, includeObsError=False, doN2OCorrectionPT700=False)

A utility function which loads TCCON observations from file, filters them, and returns a dictionary of important data formatted for input into the gcCompare method of the TCCON_Translator class.

Parameters
  • filename (str) – NetCDF file containing IASI observations to be loaded. Expects data produced through the prep_tccon_aggregated.py script, as described below.

  • species (str) – Name of species to be loaded. Currently, only “N2O” and “CO” are supported.

  • filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information

  • includeObsError (bool) – True or False, read the errors associated with individual observations.

  • doN2OCorrectionPT700 (bool) – True or False, do the temperature correction for N2O in the GGG2020.0 product set? This won’t be necessary after 2020.1 is released.

Returns

A dictionary containing observation values and metadata, ready for input into the gcCompare method of the TCCON_Translator class.

Return type

dict

correct_xn2o_from_pt700(xn2o, prior_temperature, prior_pressure, xn2o_error=None, n2o_aicf=0.9821, m=0.000626, b=0.787)

Handle temperature-dependent bias in TCCON N2O. Based on Josh Laughner’s code for GGG2020: py_tccon_netcdf/write_tccon_netcdf/bias_corrections.py. Will no longer be needed after GGG2020.1 is released.

_compute_pt700(prior_temperature, prior_pressure)

Handle temperature-dependent bias in TCCON N2O. Based on Josh Laughner’s code for GGG2020: py_tccon_netcdf/write_tccon_netcdf/bias_corrections.py. Will no longer be needed after GGG2020.1 is released.

GC_to_sat_levels(GC_SPC, GC_edges, sat_edges)

See the function from TROPOMI_tools.

gravity(altitudes, latitudes)

Compute g at vertical layers for each observation site.

integrate_column(gas_profile, h2o_profile, obh2o_profile, obpout, obpressure_profile, altitude_profile, ensemble_profile, oblat, AK)

This is the main function for TCCON column intergation, using its a-priori and averaging kernels.

Parameters
  • gas_profile (array) – The model gas profile of interest.

  • h2o_profile (array) – The model h2o profile.

  • obh2o_profile (array) – The observation h2o profile.

  • pressure_profile (array) – The model pressure profile that corresponds with the gas profile in hPa.

  • ensemble_profile (array) – The ensemble profile from equation 25 in Rodgers and Connor 2000 - this will likely be the a priori profile from GFIT; most often multiplied by the scaling factors (VSF) from GFIT for the spectra near the aircraft overpass

  • obalt (float) – The altitude of the ground-based site in m - geometric altitude

  • oblat (float) – The latitude (in degrees) of the ground-based site

  • AK (array) – The averaging kernels for all windows of the molecule of interest, in a structure

TCCON preprocessing for CHEEREIO

CHEEREIO has built in preprocessing functions to translate raw GGG2020 TCCON data, as downloaded from the CalTech site, into a form compatible with the CHEEREIO TCCON operator. To use this, execute from the command line (in CHEEREIO/core) before installing CHEEREIO: python prep_tccon_aggregated.py ARGUMENTS

Arguments are as follows:

  1. -i or –input_path (required): Path to your input TCCON files, e.g. downloaded from tccondata.org.

  2. -o or –output_path (required): Path to where to save your pre-processed TCCON files, ready for CHEEREIO.

  3. -s or –start_time (required): Start time, in format YYYY-MM-DDTHH:MM:SS. For example, 2023-01-01T00:00:00. Will only process data from after this time.

  4. -e or –end_time (required): End time, in format YYYY-MM-DDTHH:MM:SS. For example, 2023-02-01T00:00:00. Will only process data from before this time.

  5. -s2k or –species_to_keep (optional): Species to keep in your pre-processed TCCON files (e.g. just “co” or “co,co2”). If multiple, comma separated.

  6. -p or –input_file_pattern (optional): File pattern for input tccon files. If you have the public files, you do not need to modify (default is *public.qc.nc).

ObsPack tools

ObsPack is a standardized dataset containing measurements from surface monitors distributed around the world, aimed at carbon cycle studies. CHEEREIO users often use ObsPack data for validation of CO or CH4 inversions, or as observations for the inversion itself. The ObsPack observation operator, contained in the obspack_tools.py file, wraps around the Obspack diagnostic produced by GEOS-Chem and translates it into a form acceptable to CHEEREIO.

To activate ObsPack, see the Observation settings page for information on the correct settings for ens_config.json.

ObsPack fields available to save and plot in CHEEREIO

All observation operators save standard data from observations and GEOS-Chem and pass it on to CHEEREIO (including latitude, longitude, observation values, GC simulated observation values, and timestamps). However, individual operators might also be able to save additional data, such as albedo in remote sensing cases or site IDs in surface observation cases. Users can list the additional data they would like to save in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. Below are a list of the supported fields which users can save in ObsPack:

altitude

Altitude of ObsPack site.

pressure

Pressure observed at ObsPack site.

obspack_id

Unique identifier of obspack observation.

platform

Obspack platform.

site_code

Unique identifier of obspack site.

Users wishing to aggregate ObsPack results by site, either for plotting or other analysis, will want to save site_code. Follow the instructions on the Postprocessing settings to ensure this is done successfully.

Additional fields can be added by saving more metadata into the ObsData object with the addData function within the gcCompare method of the ObsPack_Translator class. See the Observations page for more details.

ObsPack preprocessing functions

CHEEREIO has built in preprocessing functions to translate raw ObsPack data, as downloaded from NOAA, into a form compatible with the GEOS-Chem ObsPack diagnostic. To use this functionality, set preprocess_raw_obspack_files to true in ens_config.json and provide a path to the raw files in the raw_obspack_path entry. However, some users report that NOAA ObsPack data is not quite standardized. If you run into preprocessing errors, you should set preprocess_raw_obspack_files to false and supply an already populated directory of manually preprocessed files. Details for how to do this are provided in the ObsPack diagnostic documentation for GEOS-Chem; feel free to use the code provided in CHEEREIO as a model. See the Observation settings page for more information on ensemble configuration settings for ObsPack.

Descriptions of the ObsPack preprocessing functions are below.

make_filter_fxn(start_date, end_date, lat_bounds=None, lon_bounds=None)

Generate a function that will filter raw ObsPack data (i.e. downloaded directly from NOAA) and keep only data within certain date and location bounds. The output filter function also does additional filtering and reformatting regardless of these bounds.

Parameters
  • start_date (datetime) – Date of earliest ObsPack data to include

  • end_date (datetime) – Date of latest ObsPack data to include

  • lat_bounds (list) – If filtering by latitude, a list of two latitudes representing minimum and maximum latitude to be kept. If None, ignore.

  • lon_bounds (list) – If filtering by longitude, a list of two longitudes representing minimum and maximum longitudes to be kept. If None, ignore.

Returns

A filter function for filtering and formatting raw ObsPack data.

Return type

Function

prep_obspack(raw_obspack_dir, gc_obspack_dir, filename_format, start_date, end_date)

This is a preprocessing function, designed to take raw ObsPack files as downloaded from NOAA and process them into files compatible with CHEEREIO and the GEOS-Chem ObsPack diagnostic.

Parameters
  • raw_obspack_dir (str) – Directory where raw ObsPack data as downloaded from NOAA is stored. CHEEREIO takes this by default from the raw_obspack_path path in ens_config.json.

  • gc_obspack_dir (str) – Directory where processed ObsPack compatible with the GEOS-Chem ObsPack diagnostic will be saved. CHEEREIO takes this by default from the gc_obspack_path path in ens_config.json.

  • filename_format (str) – File format which CHEEREIO will use to save the preprocessed ObsPack data. CHEEREIO takes this by default from the obspack_gc_input_file entry in ens_config.json.

  • start_date (datetime) – Date of earliest ObsPack data to include. CHEEREIO takes this by default from the START_DATE entry in ens_config.json.

  • end_date (datetime) – Date of latest ObsPack data to include. CHEEREIO takes this by default from the END_DATE entry in ens_config.json.