Skip to content

Commit

Permalink
CDAT Migration Phase: Refactor cosp_histogram set (#748)
Browse files Browse the repository at this point in the history
- Refactor `cosp_histogram_driver.py` and `cosp_histogram_plot.py`
- `formulas_cosp.py` (new file)
  - Includes refactored, Xarray-based `cosp_histogram_standard()` and `cosp_bin_sum()` functions
  - I wrote a lot of new code in `formulas_cosp.py` to clean up `derivations.py` and the old equivalent functions in `utils.py`
- `derivations.py`
  - Cleaned up portions of `DERIVED_VARIABLES` dictionary
  - Removed unnecessary `OrderedDict` usage for `cosp_histogram` related variables (we should do this for the rest of the variables in in #716)
  - Remove unnecessary `convert_units()` function calls
  - Move cloud levels passed to derived variable formulas to `formulas_cosp.CLOUD_BIN_SUM_MAP`
- `utils.py`
  - Delete deprecated, CDAT-based `cosp_histogram` functions
- `dataset_xr.py`
  - Add `dataset_xr.Dataset._open_climo_dataset()` method with a catch for dataset quality issues where "time" is a scalar variable that does not match the "time" dimension array length, drops this variable and replaces it with the correct coordinate
  -  Update `_get_dataset_with_derivation_func()` to handle derivation functions that require the `xr.Dataset` and `target_var_key` args (e.g., `cosp_histogram_standardize()` and `cosp_bin_sum()`)
- `io.py`
  - Update `_write_vars_to_netcdf()` to write test, ref, and diff variables to individual netCDF (required for easy comparison to CDAT-based code that does the same thing)
- Add `cdat_migration_regression_test_netcdf.ipynb` validation notebook template for comparing `.nc` files
  • Loading branch information
tomvothecoder committed Jul 15, 2024
1 parent 985ff14 commit 517c44f
Show file tree
Hide file tree
Showing 19 changed files with 2,376 additions and 748 deletions.
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[report]
exclude_also =
if TYPE_CHECKING:
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CDAT Migration Regression Testing Notebook (`.nc` files)\n",
"\n",
"This notebook is used to perform regression testing between the development and\n",
"production versions of a diagnostic set.\n",
"\n",
"## How it works\n",
"\n",
"It compares the relative differences (%) between ref and test variables between\n",
"the dev and `main` branches.\n",
"\n",
"## How to use\n",
"\n",
"PREREQUISITE: The diagnostic set's netCDF stored in `.json` files in two directories\n",
"(dev and `main` branches).\n",
"\n",
"1. Make a copy of this notebook under `auxiliary_tools/cdat_regression_testing/<DIR_NAME>`.\n",
"2. Run `mamba create -n cdat_regression_test -y -c conda-forge \"python<3.12\" xarray dask pandas matplotlib-base ipykernel`\n",
"3. Run `mamba activate cdat_regression_test`\n",
"4. Update `SET_DIR` and `SET_NAME` in the copy of your notebook.\n",
"5. Run all cells IN ORDER.\n",
"6. Review results for any outstanding differences (>=1e-5 relative tolerance).\n",
" - Debug these differences (e.g., bug in metrics functions, incorrect variable references, etc.)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup Code\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"from collections import defaultdict\n",
"import glob\n",
"\n",
"import numpy as np\n",
"import xarray as xr\n",
"\n",
"# TODO: Update SET_NAME and SET_DIR\n",
"SET_NAME = \"cosp_histogram\"\n",
"SET_DIR = \"660-cosp-histogram\"\n",
"\n",
"DEV_PATH = f\"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/{SET_DIR}/{SET_NAME}/**\"\n",
"MAIN_PATH = f\"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/{SET_NAME}/**\"\n",
"\n",
"DEV_GLOB = sorted(glob.glob(DEV_PATH + \"/*.nc\"))\n",
"MAIN_GLOB = sorted(glob.glob(MAIN_PATH + \"/*.nc\"))\n",
"\n",
"if len(DEV_GLOB) == 0 or len(MAIN_GLOB) == 0:\n",
" raise IOError(\"No files found at DEV_PATH and/or MAIN_PATH.\")\n",
"\n",
"if len(DEV_GLOB) != len(MAIN_GLOB):\n",
" raise IOError(\"Number of files do not match at DEV_PATH and MAIN_PATH.\")"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"def _get_var_to_filepath_map():\n",
" var_to_file = defaultdict(lambda: defaultdict(dict))\n",
"\n",
" for dev_file, main_file in zip(DEV_GLOB, MAIN_GLOB):\n",
" # Example:\n",
" # \"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_test.nc\"\n",
" file_arr = dev_file.split(\"/\")\n",
"\n",
" # Example: \"test\"\n",
" data_type = dev_file.split(\"_\")[-1].split(\".nc\")[0]\n",
"\n",
" # Skip comparing `.nc` \"diff\" files because comparing relative diffs of\n",
" # does not make sense.\n",
" if data_type == \"test\" or data_type == \"ref\":\n",
" # Example: \"ISCCP\"\n",
" model = file_arr[-2].split(\"-\")[0]\n",
" season = \"JJA\" if \"JJA\" in dev_file else \"ANN\"\n",
"\n",
" var_to_file[model][data_type][season] = (dev_file, main_file)\n",
"\n",
" return var_to_file\n",
"\n",
"\n",
"def _get_relative_diffs(var_to_filepath):\n",
" # Absolute tolerance of 0 and relative tolerance of 1e-5.\n",
" # We are mainly focusing on relative tolerance here (in percentage terms).\n",
" atol = 0\n",
" rtol = 1e-5\n",
"\n",
" for model, data_types in var_to_filepath.items():\n",
" for _, seasons in data_types.items():\n",
" for _, filepaths in seasons.items():\n",
" print(\"Comparing:\")\n",
" print(filepaths[0], \"\\n\", filepaths[1])\n",
" ds1 = xr.open_dataset(filepaths[0])\n",
" ds2 = xr.open_dataset(filepaths[1])\n",
"\n",
" try:\n",
" var_key = f\"COSP_HISTOGRAM_{model}\"\n",
" np.testing.assert_allclose(\n",
" ds1[var_key].values,\n",
" ds2[var_key].values,\n",
" atol=atol,\n",
" rtol=rtol,\n",
" )\n",
" except AssertionError as e:\n",
" print(e)\n",
" else:\n",
" print(f\" * All close and within relative tolerance ({rtol})\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Compare the netCDF files between branches\n",
"\n",
"- Compare \"ref\" and \"test\" files\n",
"- \"diff\" files are ignored because getting relative diffs for these does not make sense (relative diff will be above tolerance)\n"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"var_to_filepaths = _get_var_to_filepath_map()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_ref.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_ref.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-JJA-global_ref.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-JJA-global_ref.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_test.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-ANN-global_test.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-JJA-global_test.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/ISCCP-COSP/ISCCPCOSP-COSP_HISTOGRAM_ISCCP-JJA-global_test.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-ANN-global_ref.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-ANN-global_ref.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-JJA-global_ref.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-JJA-global_ref.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-ANN-global_test.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-ANN-global_test.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-JJA-global_test.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MISR-COSP/MISRCOSP-COSP_HISTOGRAM_MISR-JJA-global_test.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-ANN-global_ref.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-ANN-global_ref.nc\n",
"\n",
"Not equal to tolerance rtol=1e-05, atol=0\n",
"\n",
"Mismatched elements: 42 / 42 (100%)\n",
"Max absolute difference: 4.23048367e-05\n",
"Max relative difference: 1.16682146e-05\n",
" x: array([[0.703907, 2.669376, 3.065526, 1.579834, 0.363847, 0.128541],\n",
" [0.147366, 1.152637, 3.67049 , 3.791006, 1.398453, 0.392103],\n",
" [0.07496 , 0.474791, 1.37002 , 1.705649, 0.786423, 0.346744],...\n",
" y: array([[0.703899, 2.669347, 3.065492, 1.579816, 0.363843, 0.12854 ],\n",
" [0.147364, 1.152624, 3.670448, 3.790965, 1.398438, 0.392099],\n",
" [0.074959, 0.474786, 1.370004, 1.705629, 0.786415, 0.34674 ],...\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-JJA-global_ref.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-JJA-global_ref.nc\n",
"\n",
"Not equal to tolerance rtol=1e-05, atol=0\n",
"\n",
"Mismatched elements: 42 / 42 (100%)\n",
"Max absolute difference: 4.91806181e-05\n",
"Max relative difference: 1.3272405e-05\n",
" x: array([[0.62896 , 2.657657, 3.206268, 1.704946, 0.398659, 0.169424],\n",
" [0.147569, 1.228835, 3.697387, 3.727142, 1.223123, 0.436504],\n",
" [0.072129, 0.508413, 1.167637, 1.412202, 0.638085, 0.362268],...\n",
" y: array([[0.628952, 2.657625, 3.206227, 1.704924, 0.398654, 0.169422],\n",
" [0.147567, 1.228819, 3.697338, 3.727093, 1.223107, 0.436498],\n",
" [0.072128, 0.508407, 1.167622, 1.412183, 0.638076, 0.362263],...\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-ANN-global_test.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-ANN-global_test.nc\n",
" * All close and within relative tolerance (1e-05)\n",
"Comparing:\n",
"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/660-cosp-histogram/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-JJA-global_test.nc \n",
" /global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/cosp_histogram/MODIS-COSP/MODISCOSP-COSP_HISTOGRAM_MODIS-JJA-global_test.nc\n",
" * All close and within relative tolerance (1e-05)\n"
]
}
],
"source": [
"_get_relative_diffs(var_to_filepaths)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Results\n",
"\n",
"- The relative tolerance of all files are 1e-05, which means things should be good to go.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "cdat_regression_test",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from auxiliary_tools.cdat_regression_testing.base_run_script import run_set

SET_NAME = "cosp_histogram"
SET_DIR = "660-cosp-histogram"

run_set(SET_NAME, SET_DIR)
4 changes: 2 additions & 2 deletions auxiliary_tools/cdat_regression_testing/base_run_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ class MachinePaths(TypedDict):
tc_test: str


def run_set(set_name: str, set_dir: str, save_netcdf: bool):
def run_set(set_name: str, set_dir: str):
machine_paths: MachinePaths = _get_machine_paths()

param = CoreParameter()
Expand All @@ -62,7 +62,7 @@ def run_set(set_name: str, set_dir: str, save_netcdf: bool):
param.num_workers = 5

# Make sure to save the netCDF files to compare outputs.
param.save_netcdf = save_netcdf
param.save_netcdf = True

# Set specific parameters for new sets
enso_param = EnsoDiagsParameter()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# CDAT Migration Regression Testing Notebook\n",
"# CDAT Migration Regression Testing Notebook (`.json` metrics)\n",
"\n",
"This notebook is used to perform regression testing between the development and\n",
"production versions of a diagnostic set.\n",
Expand Down Expand Up @@ -47,25 +47,27 @@
"metadata": {},
"outputs": [],
"source": [
"from collections import defaultdict\n",
"import glob\n",
"from auxiliary_tools.cdat_regression_testing.utils import (\n",
" get_metrics,\n",
" get_rel_diffs,\n",
" get_num_metrics_above_diff_thres,\n",
" highlight_large_diffs,\n",
" sort_columns,\n",
" update_diffs_to_pct,\n",
" PERCENTAGE_COLUMNS,\n",
")\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"import xarray as xr\n",
"\n",
"# TODO: Update SET_NAME and SET_DIR\n",
"SET_NAME = \"cosp_histogram\"\n",
"SET_DIR = \"660-cosp-histogram\"\n",
"\n",
"# TODO: Update DEV_RESULTS and MAIN_RESULTS to your diagnostic sets.\n",
"DEV_PATH = \"/global/cfs/cdirs/e3sm/www/vo13/examples_658/ex1_modTS_vs_modTS_3years/lat_lon/model_vs_model\"\n",
"MAIN_PATH = \"/global/cfs/cdirs/e3sm/www/vo13/examples/ex1_modTS_vs_modTS_3years/lat_lon/model_vs_model\"\n",
"DEV_PATH = f\"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/{SET_DIR}/{SET_NAME}/**\"\n",
"MAIN_PATH = f\"/global/cfs/projectdirs/e3sm/e3sm_diags_cdat_test/main/{SET_NAME}/**\"\n",
"\n",
"DEV_GLOB = sorted(glob.glob(DEV_PATH + \"/*.json\"))\n",
"MAIN_GLOB = sorted(glob.glob(MAIN_PATH + \"/*.json\"))"
"MAIN_GLOB = sorted(glob.glob(MAIN_PATH + \"/*.json\"))\n",
"\n",
"if len(DEV_GLOB) == 0 or len(MAIN_GLOB) == 0:\n",
" raise IOError(\"No files found at DEV_PATH and/or MAIN_PATH.\")\n",
"\n",
"if len(DEV_GLOB) != len(MAIN_GLOB):\n",
" raise IOError(\"Number of files do not match at DEV_PATH and MAIN_PATH.\")"
]
},
{
Expand Down
Loading

0 comments on commit 517c44f

Please sign in to comment.