Skip to content

Commit

Permalink
Update sections;
Browse files Browse the repository at this point in the history
  • Loading branch information
tomvothecoder committed Jan 11, 2024
1 parent 495cc8d commit f7ea742
Show file tree
Hide file tree
Showing 8 changed files with 8 additions and 8 deletions.
Binary file removed docs/paper/figures/combined.png
Binary file not shown.
Binary file removed docs/paper/figures/fig1.png
Binary file not shown.
Binary file removed docs/paper/figures/fig3.png
Binary file not shown.
Binary file added docs/paper/figures/figure1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/paper/figures/figure2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
2 changes: 1 addition & 1 deletion docs/paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ @software{e3sm-unified
year = 2023,
publisher = {GitHub},
version = {v1.9.1},
url = {https://github.com/E3SM-Project/e3sm-unified)}
url = {https://github.com/E3SM-Project/e3sm-unified}
}

@software{pcmdi-metrics,
Expand Down
14 changes: 7 additions & 7 deletions docs/paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,15 @@ Analysis of climate and weather data frequently requires a number of core operat

xCDAT addresses this need by combining the power of Xarray with meticulously developed geospatial analysis features inspired by CDAT. Xarray is the foundation of xCDAT because of its widespread adoption, technological maturity, and ability to handle large datasets with parallel computing via Dask. Xarray is also interoperable with the scientific Python ecosystem (e.g., NumPy, Pandas, Matplotlib), which greatly benefits users who need to use additional tools for different scenarios. Since Xarray is designed as a general-purpose library, xCDAT fills in domain specific gaps by providing features to serve the climate science community _(refer to [Key Features](#key-features))_.

xCDAT's intentional design emphasizes software sustainability and reproducible science. It aims to make analysis code reusable, readable, and less-error prone by abstracting common Xarray boilerplate logic into simple and configurable APIs. xCDAT extends Xarray by using [accessor classes](https://docs.xarray.dev/en/stable/internals/extending-xarray.html) that operate directly on Xarray Dataset objects. xCDAT is rigorously tested using real-world datasets and maintains 100% unit test coverage (at the time this paper was written). To demonstrate the value in xCDAT's API design, Figure 1 compares code to calculate annual averages for global climatological anomalies using Xarray against xCDAT. Figure 2 shows the plots for the results produced by xCDAT.
Performance is one fundamental driver in how xCDAT is designed, especially with large datasets. xCDAT conveniently inherits Xarray's support for parallel computing with Dask [@dask:2016]. [Parallel computing with Dask](https://docs.xarray.dev/en/stable/user-guide/dask.html) enables users to take advantage of compute resources through multithreading or multiprocessing. To use Dask's default multithreading scheduler, users only need to open and chunk datasets in Xarray before calling xCDAT APIs. xCDAT's seamless support for parallel computing enables users to run large-scale computations with minimal effort. If users require more resources, they can also configure and use a local Dask cluster to meet resource-intensive computational needs. Figure 1 shows xCDAT's significant performance advantage over CDAT for spatial averaging on datasets of varying sizes.

<!-- Fix the resolution of this plot -->
![A performance benchmark for spatial averaging computations using xCDAT (serial and parallel using local Dask distributed scheduler) and CDAT (serial only). xCDAT serial and parallel outperforms CDAT by a wide margin for the 7 GB and 12 GB datasets. Note, runtimes could not be captured with xCDAT serial for the 105 GB file and CDAT with file sizes >= 22 GB due to memory allocation errors. _Note: performance will vary depending on hardware, datasets, and how Dask and chunking schemes are configured. The performance benchmark setup and scripts are available in the_ [_xCDAT validation repo_](https://github.com/xCDAT/xcdat-validation/tree/main/validation/v0.6.0/xcdat-cdat-perf-metrics)_._ \label{fig:figure1}](figures/figure1.png){ height=40% }

![A comparison of the code to calculate annual averages for global climatological anomalies in A) Xarray and B) xCDAT. xCDAT abstracts most of the Xarray boilerplate logic for calculating weights and grouping data by specific time frequencies, leading to code that is more readable, maintainable, and flexible. The results from both sets of code are within machine precision. \label{fig:fig1}](figures/combined.png){ height=100% }
xCDAT's intentional design emphasizes software sustainability and reproducible science. It aims to make analysis code reusable, readable, and less-error prone by abstracting common Xarray boilerplate logic into simple and configurable APIs. xCDAT extends Xarray by using [accessor classes](https://docs.xarray.dev/en/stable/internals/extending-xarray.html) that operate directly on Xarray Dataset objects. xCDAT is rigorously tested using real-world datasets and maintains 100% unit test coverage (at the time this paper was written). To demonstrate the value in xCDAT's API design, Figure 2 compares code to calculate annual averages for global climatological anomalies using Xarray against xCDAT. Figure 3 shows the plots for the results produced by xCDAT.

![A) Monthly surface skin temperature anomalies for September 1850. B) Monthly (gray) and annual (black) global mean surface skin temperature anomaly values. Temperature data is from an E3SMv2 climate model simulation over the historical period (1850 – 2014). \label{fig:fig2}](figures/fig2.png){ height=45% }
![A comparison of the code to calculate annual averages for global climatological anomalies in A) Xarray and B) xCDAT. xCDAT abstracts most of the Xarray boilerplate logic for calculating weights and grouping data by specific time frequencies, leading to code that is more readable, maintainable, and flexible. The results from both sets of code are within machine precision. \label{fig:figure2}](figures/figure2.png){ height=100% }

Performance is another fundamental driver in how xCDAT is designed, especially with large datasets. xCDAT conveniently inherits Xarray's support for parallel computing with Dask [@dask:2016]. [Parallel computing with Dask](https://docs.xarray.dev/en/stable/user-guide/dask.html) enables users to take advantage of compute resources through multithreading or multiprocessing. To use Dask's default multithreading scheduler, users only need to open and chunk datasets in Xarray before calling xCDAT APIs. xCDAT's seamless support for parallel computing enables users to run large-scale computations with minimal effort. If users require more resources, they can also configure and use a local Dask cluster to meet resource-intensive computational needs. Figure 3 shows xCDAT's significant performance advantage over CDAT for spatial averaging on datasets of varying sizes.

![A performance benchmark for spatial averaging computations using xCDAT (serial and parallel using local Dask distributed scheduler) and CDAT (serial only). xCDAT serial and parallel outperforms CDAT by a wide margin for the 7 GB and 12 GB datasets. Note, runtimes could not be captured with xCDAT serial for the 105 GB file and CDAT with file sizes >= 22 GB due to memory allocation errors. _Note: performance will vary depending on hardware, datasets, and how Dask and chunking schemes are configured. The performance benchmark setup and scripts are available in the_ [_xCDAT validation repo_](https://github.com/xCDAT/xcdat-validation/tree/main/validation/v0.6.0/xcdat-cdat-perf-metrics)_._ \label{fig:fig3}](figures/fig3.png){ height=30% }
![A) Monthly surface skin temperature anomalies for September 1850. B) Monthly (gray) and annual (black) global mean surface skin temperature anomaly values. Temperature data is from an E3SMv2 climate model simulation over the historical period (1850 – 2014). \label{fig:figure3}](figures/figure3.png){ height=45% }

xCDAT's mission is to provide a maintainable and extensible package that serves the needs of the climate community in the long-term. xCDAT is a community-driven project and the development team encourages all who are interested to get involved through the [GitHub repository](https://github.com/xCDAT/xcdat).

Expand Down Expand Up @@ -106,3 +104,5 @@ xCDAT is actively being integrated as a core component of the Program for Climat
xCDAT is jointly developed by scientists and developers from the Energy Exascale Earth System Model ([E3SM](https://e3sm.org/)) Project and Program for Climate Model Diagnosis and Intercomparison ([PCMDI](https://pcmdi.llnl.gov/)). The work is performed for the E3SM project, which is sponsored by Earth System Model Development ([ESMD](https://climatemodeling.science.energy.gov/program/earth-system-model-development)) program, and the Simplifying ESM Analysis Through Standards ([SEATS](https://www.seatstandards.org/)) project, which is sponsored by the Regional and Global Model Analysis ([RGMA](https://climatemodeling.science.energy.gov/program/regional-global-model-analysis)) program. ESMD and RGMA are programs for the Earth and Environmental Systems Sciences Division ([EESSD](https://science.osti.gov/ber/Research/eessd)) in the Office of Biological and Environmental Research ([BER](https://science.osti.gov/ber)) within the [Department of Energy](https://www.energy.gov/)'s [Office of Science](https://science.osti.gov/). This work is performed under the auspices of the U.S. Department of Energy by LLNL under Contract No. DE-AC52-07NA27344.

Thank you to all of the xCDAT contributors and users. We also give a special thanks to Karl Taylor, Peter Gleckler, Paul Durack, and Chris Golaz who all have provided valuable knowledge and guidance throughout the course of this project.

# References

0 comments on commit f7ea742

Please sign in to comment.