Existing observation operators¶
CHEEREIO ships with observation operators that have produced by the community. Users can add their own operators by following the instruction on the Workflow to add a new observation operator page, or they can use the operators listed below.
TROPOMI tools¶
The TROPOspheric Monitoring Instrument (TROPOMI) onboard Sentinel-5 Precursor satellite satellite measures criteria air pollutants and other trace gases of interest to CHEEREIO users. Currently, there are three TROPOMI operators written for CHEEREIO: CH4, CO (the latter contributed by Sina Voshtani), and NO2. Users can follow the pattern in tropomi_tools.py to add support for additional species.
To activate TROPOMI observations, list “TROPOMI” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.
TROPOMI fields available to save and plot in CHEEREIO¶
All observation operators save standard data from observations and GEOS-Chem and pass it on to CHEEREIO (including latitude, longitude, observation values, GC simulated observation values, and timestamps). However, individual operators might also be able to save additional data, such as albedo in remote sensing cases or site IDs in surface observation cases. Users can list the additional data they would like to save in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. With the TROPOMI operator, all data read in by the read_tropomi function are available to be saved and/or plotted. Below is a subset of the supported fields which users can save in TROPOMI:
-
qa_value¶ Quality assurance value.
-
column_AK¶ Satellite averaging kernel.
-
albedo_swir¶ Shortwave infrared albedo.
-
albedo_nir¶ Near infrared albedo.
-
blended_albedo¶ Blended albedo.
-
methane_profile_apriori¶ A priori methane profile. (Methane only)
TROPOMI operator support functions¶
The TROPOMI observation operator calls a handful of utility functions to process observations from file and format them in such a way that the gcCompare method of the TROPOMI_Translator class can perform relevant computations.
The first two utility functions are designed to remap GEOS-Chem pressure levels to satellite pressure levels and apply the averaging kernel.
-
GC_to_sat_levels(GC_SPC, GC_edges, sat_edges, species, chunk_size=10000)¶ Takes as input GEOS-Chem data and pressure level edges, as well as satellite pressure levels, and calculate GEOS-Chem data values on satellite pressure levels
- Parameters
GC_SPC (array) – A NumPy array containing GEOS-Chem columns.
GC_edges (array) – A NumPy array containing GEOS-Chem pressure level edges.
sat_edges (array) – A NumPy array containing TROPOMI pressure level edges
species (str) – Species to be processed.
chunk_size (int) – For CO, the number of observations to be processed at once. This is to save memory for TROPOMI observations with high vertical resolution.
- Returns
A NumPy array containing GEOS-Chem columns remapped to be on the TROPOMI pressure levels.
- Return type
array
-
apply_avker(sat_avker, sat_pressure_weight, GC_SPC, sat_prior=None, filt=None)¶ Apply the averaging kernel
- Parameters
sat_avker (array) – TROPOMI averaging kernel.
sat_prior (array) – The satellite prior profile in ppb, optional (used for CH4).
sat_pressure_weight (array) – The relative pressure weights for each level
GC_SPC (array) – The GC species on the satellite levels, output by GC_to_sat_levels
filt (array) – A filter, optional
- Returns
A NumPy array containing simulated GEOS-Chem values such that they are directly comparable to TROPOMI (i.e. because the averaging kernel has been applied).
- Return type
array
The remaining three utility functions are very similar. They read TROPOMI level 2 observations from file, but for methane there are a variety of TROPOMI observations with a variety of formattings (operational, science product, Harvard-specific standard). Each function is designed to read a different formatting. Users can select which function they would like to use by specifying the WHICH_TROPOMI_PRODUCT setting in the TROPOMI_CH4_extension.json file. DEFAULT selects the TROPOMI operational product, ACMG for the ACMG/Harvard TROPOMI product, and BLENDED for Belasus et al., 2023 which also works for the TROPOMI science product. For CO, users must select the DEFAULT option.
-
read_tropomi(filename, species, filterinfo=None, includeObsError=False)¶ Designed for the TROPOMI operational level 2 product. A utility function which loads TROPOMI observations from file, filters them, and returns a dictionary of important data formatted for input into the
gcComparemethod of theTROPOMI_Translatorclass. This function is selected when users supply theDEFAULTvalue to theWHICH_TROPOMI_PRODUCTsetting in theTROPOMI_CH4_extension.jsonfile.- Parameters
filename (str) – NetCDF file containing TROPOMI observations to be loaded. Expects standard level 2 data.
species (str) – Name of species to be loaded. Currently, only “CH4” is supported, though an “NO2” operator is partially written.
filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information
includeObsError (bool) – True or False, read the errors associated with individual observations.
- Returns
A dictionary containing observation values and metadata, ready for input into the
gcComparemethod of theTROPOMI_Translatorclass.- Return type
dict
-
read_tropomi_acmg(filename, species, filterinfo=None, includeObsError=False)¶
Designed for the Harvard/ACMG version of the TROPOMI operational level 2 product. A utility function which loads TROPOMI observations from file, filters them, and returns a dictionary of important data formatted for input into the
gcComparemethod of theTROPOMI_Translatorclass. This function is selected when users supply theACMGvalue to theWHICH_TROPOMI_PRODUCTsetting in theTROPOMI_CH4_extension.jsonfile.- Parameters
filename (str) – NetCDF file containing TROPOMI observations to be loaded. Expects standard level 2 data.
species (str) – Name of species to be loaded. Currently, only “CH4” is supported, though an “NO2” operator is partially written.
filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information
includeObsError (bool) – True or False, read the errors associated with individual observations.
- Returns
A dictionary containing observation values and metadata, ready for input into the
gcComparemethod of theTROPOMI_Translatorclass.- Return type
dict
-
read_tropomi_gosat_corrected(filename, species, filterinfo=None, includeObsError=False)¶
Designed for the Belasus et al., 2023 version of the TROPOMI operational level 2 product, but also works for the TROPOMI science product. A utility function which loads TROPOMI observations from file, filters them, and returns a dictionary of important data formatted for input into the
gcComparemethod of theTROPOMI_Translatorclass. This function is selected when users supply theBLENDEDvalue to theWHICH_TROPOMI_PRODUCTsetting in theTROPOMI_CH4_extension.jsonfile.- Parameters
filename (str) – NetCDF file containing TROPOMI observations to be loaded. Expects standard level 2 data.
species (str) – Name of species to be loaded. Currently, only “CH4” is supported, though an “NO2” operator is partially written.
filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information
includeObsError (bool) – True or False, read the errors associated with individual observations.
- Returns
A dictionary containing observation values and metadata, ready for input into the
gcComparemethod of theTROPOMI_Translatorclass.- Return type
dict
OMI tools¶
NASA’s Ozone Monitoring Instrument (OMI) onboard the Aura satellite measures criteria air pollutants and other trace gases of interest to CHEEREIO users. Currently, NO2is the only OMI operator written for CHEEREIO, but users can follow the pattern in omi_tools.py to add support for additional species.
To activate OMI observations, list “OMI” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.
OMI fields available to save and plot in CHEEREIO¶
All observation operators save standard data from observations and GEOS-Chem and pass it on to CHEEREIO (including latitude, longitude, observation values, GC simulated observation values, and timestamps). However, individual operators might also be able to save additional data, such as albedo in remote sensing cases or site IDs in surface observation cases. Users can list the additional data they would like to save in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page.
Currently, no additional ObsData fields are available to be saved and/or plotted with the OMI operator as written. Additional fields can be added by saving more metadata into the ObsData object with the addData function within the gcCompare method of the OMI_Translator class. See the Observations page for more details.
OMI operator support functions¶
The OMI observation operator calls two utility functions to process observations from file and format them in such a way that the gcCompare method of the OMI_Translator class can perform relevant computations. They are documented below:
-
read_omi(filename, species, filterinfo=None, includeObsError=False)¶ A utility function which loads OMI observations from file, filters them, and returns a dictionary of important data formatted for input into the
gcComparemethod of theOMI_Translatorclass.- Parameters
filename (str) – NetCDF file containing OMI observations to be loaded. Expects standard level 2 data.
species (str) – Name of species to be loaded. Currently, only “NO2” is supported.
filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information
includeObsError (bool) – True or False, read the errors associated with individual observations.
- Returns
A dictionary containing observation values and metadata, ready for input into the
gcComparemethod of theOMI_Translatorclass.- Return type
dict
-
clearEdgesFilterByQAAndFlatten(met)¶ A utility function takes in partially formatted OMI data and does additional processing, outputting a flattened set of arrays which are compatible with CHEEREIO. In the process, the function removes swath edges and bad retrieval values.
- Parameters
met (dict) – A dictionary with keys naming important observation data and metadata, and values of raw 2D swath data from OMI.
- Returns
A dictionary containing flattened observation values and metadata, with bad data removed, ready for input into the gcCompare` method of the
OMI_Translatorclass.- Return type
dict
IASI tools¶
The Infrared Atmospheric Sounding Interferometer (IASI) instrument measures trace gases of interest to CHEEREIO users. Currently, NH3 (as retrieved by the ANNI v4 algorithm from the Université Libre de Bruxelles group) is the only IASI operator written for CHEEREIO, but users can follow the pattern in iasi_tools.py to add support for additional species.
To activate IASI observations, list “IASI” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.
IASI fields available to save and plot in CHEEREIO¶
Users can list additional fields (beyond the minimum observations and spatiotemporal location) they would like to save from IASI in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. With the IASI operator, all data read in by the read_iasi function are available to be saved and/or plotted.
Note that the IASI operator calculates whether an observation should be discarded after the operator is applied, as we can only evaluate whether the observation is in appropriate error bounds recommended by ULB once we replace the prior column with the GC column. This is implemented through the postfilter functionality, as described on the (7) [optional] Add ability to filter observations after the operator is applied entry.
IASI operator support functions¶
The IASI observation operator calls two utility functions to process observations from file and format them in such a way that the gcCompare method of the IASI_Translator class can perform relevant computations. They are documented below:
-
read_iasi(filename, species, filterinfo=None, includeObsError=False)¶ A utility function which loads IASI observations from file, filters them, and returns a dictionary of important data formatted for input into the
gcComparemethod of theIASI_Translatorclass.- Parameters
filename (str) – NetCDF file containing IASI observations to be loaded. Expects standard level 2 data.
species (str) – Name of species to be loaded. Currently, only “NH3” is supported.
filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information
includeObsError (bool) – True or False, read the errors associated with individual observations.
- Returns
A dictionary containing observation values and metadata, ready for input into the
gcComparemethod of theIASI_Translatorclass.- Return type
dict
-
GC_to_sat_levels(GC_SPC, GC_bxheight, sat_edges)¶ See the function from TROPOMI_tools. Note that for the IASI product we only have height from surface, so the function uses GC Boxheight diagnostic to do a pressure level regrid approximation.
TCCON tools¶
The Total Carbon Column Observing Network (TCCON) measures trace gases of interest to CHEEREIO users. Currently, CO and nitrous oxide are supported for CHEEREIO, but users can follow the pattern in tccon_tools.py to add support for additional species. To activate TCCON observations, list “TCCON” as an observation type in the OBS_TYPE setting, as described on the Observation settings page.
The TCCON operator was originally built by Sina Voshtani. See CHEEREIO papers to cite for a recommended citation.
TCCON fields available to save and plot in CHEEREIO¶
Users can list additional fields (beyond the minimum observations and spatiotemporal location) they would like to save from TCCON in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. With the TCCON operator, all data read in by the read_tccon function are available to be saved and/or plotted.
TCCON operator support functions¶
The TCCON observation operator calls two utility functions to process observations from file and format them in such a way that the gcCompare method of the TCCON_Translator class can perform relevant computations. They are documented below:
-
read_tccon(filename, species, filterinfo=None, includeObsError=False, doN2OCorrectionPT700=False)¶ A utility function which loads TCCON observations from file, filters them, and returns a dictionary of important data formatted for input into the
gcComparemethod of theTCCON_Translatorclass.- Parameters
filename (str) – NetCDF file containing IASI observations to be loaded. Expects data produced through the prep_tccon_aggregated.py script, as described below.
species (str) – Name of species to be loaded. Currently, only “N2O” and “CO” are supported.
filterinfo (dict) – A dictionary of information about data filtering which is passed to a standard observation operator utility function. See (6) [optional] Add observation filters via an extension for more information
includeObsError (bool) – True or False, read the errors associated with individual observations.
doN2OCorrectionPT700 (bool) – True or False, do the temperature correction for N2O in the GGG2020.0 product set? This won’t be necessary after 2020.1 is released.
- Returns
A dictionary containing observation values and metadata, ready for input into the
gcComparemethod of theTCCON_Translatorclass.- Return type
dict
-
correct_xn2o_from_pt700(xn2o, prior_temperature, prior_pressure, xn2o_error=None, n2o_aicf=0.9821, m=0.000626, b=0.787)¶ Handle temperature-dependent bias in TCCON N2O. Based on Josh Laughner’s code for GGG2020: py_tccon_netcdf/write_tccon_netcdf/bias_corrections.py. Will no longer be needed after GGG2020.1 is released.
-
_compute_pt700(prior_temperature, prior_pressure)¶ Handle temperature-dependent bias in TCCON N2O. Based on Josh Laughner’s code for GGG2020: py_tccon_netcdf/write_tccon_netcdf/bias_corrections.py. Will no longer be needed after GGG2020.1 is released.
-
GC_to_sat_levels(GC_SPC, GC_edges, sat_edges)¶ See the function from TROPOMI_tools.
-
gravity(altitudes, latitudes)¶ Compute g at vertical layers for each observation site.
-
integrate_column(gas_profile, h2o_profile, obh2o_profile, obpout, obpressure_profile, altitude_profile, ensemble_profile, oblat, AK)¶ This is the main function for TCCON column intergation, using its a-priori and averaging kernels.
- Parameters
gas_profile (array) – The model gas profile of interest.
h2o_profile (array) – The model h2o profile.
obh2o_profile (array) – The observation h2o profile.
pressure_profile (array) – The model pressure profile that corresponds with the gas profile in hPa.
ensemble_profile (array) – The ensemble profile from equation 25 in Rodgers and Connor 2000 - this will likely be the a priori profile from GFIT; most often multiplied by the scaling factors (VSF) from GFIT for the spectra near the aircraft overpass
obalt (float) – The altitude of the ground-based site in m - geometric altitude
oblat (float) – The latitude (in degrees) of the ground-based site
AK (array) – The averaging kernels for all windows of the molecule of interest, in a structure
TCCON preprocessing for CHEEREIO¶
CHEEREIO has built in preprocessing functions to translate raw GGG2020 TCCON data, as downloaded from the CalTech site, into a form compatible with the CHEEREIO TCCON operator. To use this, execute from the command line (in CHEEREIO/core) before installing CHEEREIO: python prep_tccon_aggregated.py ARGUMENTS
Arguments are as follows:
-i or –input_path (required): Path to your input TCCON files, e.g. downloaded from tccondata.org.
-o or –output_path (required): Path to where to save your pre-processed TCCON files, ready for CHEEREIO.
-s or –start_time (required): Start time, in format YYYY-MM-DDTHH:MM:SS. For example, 2023-01-01T00:00:00. Will only process data from after this time.
-e or –end_time (required): End time, in format YYYY-MM-DDTHH:MM:SS. For example, 2023-02-01T00:00:00. Will only process data from before this time.
-s2k or –species_to_keep (optional): Species to keep in your pre-processed TCCON files (e.g. just “co” or “co,co2”). If multiple, comma separated.
-p or –input_file_pattern (optional): File pattern for input tccon files. If you have the public files, you do not need to modify (default is
*public.qc.nc).
ObsPack tools¶
ObsPack is a standardized dataset containing measurements from surface monitors distributed around the world, aimed at carbon cycle studies. CHEEREIO users often use ObsPack data for validation of CO or CH4 inversions, or as observations for the inversion itself. The ObsPack observation operator, contained in the obspack_tools.py file, wraps around the Obspack diagnostic produced by GEOS-Chem and translates it into a form acceptable to CHEEREIO.
To activate ObsPack, see the Observation settings page for information on the correct settings for ens_config.json.
ObsPack fields available to save and plot in CHEEREIO¶
All observation operators save standard data from observations and GEOS-Chem and pass it on to CHEEREIO (including latitude, longitude, observation values, GC simulated observation values, and timestamps). However, individual operators might also be able to save additional data, such as albedo in remote sensing cases or site IDs in surface observation cases. Users can list the additional data they would like to save in the EXTRA_OBSDATA_FIELDS_TO_SAVE_TO_BIG_Y setting in ens_config.json, following the instructions on the Postprocessing settings page. Below are a list of the supported fields which users can save in ObsPack:
-
altitude¶ Altitude of ObsPack site.
-
pressure¶ Pressure observed at ObsPack site.
-
obspack_id¶ Unique identifier of obspack observation.
-
platform¶ Obspack platform.
-
site_code¶ Unique identifier of obspack site.
Users wishing to aggregate ObsPack results by site, either for plotting or other analysis, will want to save site_code. Follow the instructions on the Postprocessing settings to ensure this is done successfully.
Additional fields can be added by saving more metadata into the ObsData object with the addData function within the gcCompare method of the ObsPack_Translator class. See the Observations page for more details.
ObsPack preprocessing functions¶
CHEEREIO has built in preprocessing functions to translate raw ObsPack data, as downloaded from NOAA, into a form compatible with the GEOS-Chem ObsPack diagnostic. To use this functionality, set preprocess_raw_obspack_files to true in ens_config.json and provide a path to the raw files in the raw_obspack_path entry. However, some users report that NOAA ObsPack data is not quite standardized. If you run into preprocessing errors, you should set preprocess_raw_obspack_files to false and supply an already populated directory of manually preprocessed files. Details for how to do this are provided in the ObsPack diagnostic documentation for GEOS-Chem; feel free to use the code provided in CHEEREIO as a model. See the Observation settings page for more information on ensemble configuration settings for ObsPack.
Descriptions of the ObsPack preprocessing functions are below.
-
make_filter_fxn(start_date, end_date, lat_bounds=None, lon_bounds=None)¶ Generate a function that will filter raw ObsPack data (i.e. downloaded directly from NOAA) and keep only data within certain date and location bounds. The output filter function also does additional filtering and reformatting regardless of these bounds.
- Parameters
start_date (datetime) – Date of earliest ObsPack data to include
end_date (datetime) – Date of latest ObsPack data to include
lat_bounds (list) – If filtering by latitude, a list of two latitudes representing minimum and maximum latitude to be kept. If None, ignore.
lon_bounds (list) – If filtering by longitude, a list of two longitudes representing minimum and maximum longitudes to be kept. If None, ignore.
- Returns
A filter function for filtering and formatting raw ObsPack data.
- Return type
Function
-
prep_obspack(raw_obspack_dir, gc_obspack_dir, filename_format, start_date, end_date)¶ This is a preprocessing function, designed to take raw ObsPack files as downloaded from NOAA and process them into files compatible with CHEEREIO and the GEOS-Chem ObsPack diagnostic.
- Parameters
raw_obspack_dir (str) – Directory where raw ObsPack data as downloaded from NOAA is stored. CHEEREIO takes this by default from the
raw_obspack_pathpath inens_config.json.gc_obspack_dir (str) – Directory where processed ObsPack compatible with the GEOS-Chem ObsPack diagnostic will be saved. CHEEREIO takes this by default from the
gc_obspack_pathpath inens_config.json.filename_format (str) – File format which CHEEREIO will use to save the preprocessed ObsPack data. CHEEREIO takes this by default from the
obspack_gc_input_fileentry inens_config.json.start_date (datetime) – Date of earliest ObsPack data to include. CHEEREIO takes this by default from the
START_DATEentry inens_config.json.end_date (datetime) – Date of latest ObsPack data to include. CHEEREIO takes this by default from the
END_DATEentry inens_config.json.