Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CHELSA-ERA5 tiled climatologies. #97

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Add CHELSA-ERA5 tiled climatologies. #97

wants to merge 5 commits into from

Conversation

juseg
Copy link
Owner

@juseg juseg commented Jul 9, 2024

  • Add CERA5TiledAggregator class.
  • Add 'cera5' atmospheric data source.
  • Add 'cera5' elevation data source.
  • Consider renaming coords as in 'cw5e5'.
  • Update docstrings and whatsnew.rst.

Close #96.

@juseg juseg added the enhancement New feature or request label Jul 9, 2024
@juseg juseg added this to the v0.3.2 milestone Jul 9, 2024
@juseg juseg self-assigned this Jul 9, 2024
@juseg
Copy link
Owner Author

juseg commented Jul 9, 2024

This works. The data re-projection is blazing-fast in comparison to working with the geotiffs (as evidenced using a dask.diagnostic.ProgressBar(). However, the overall atmosphere() call time is barely faster. I think this is due to the large overhead of reading metadata from 864 netCDF tile files.

Now I assume CHELSA-W5E5 suffers the same problem. I need to think about that. Ideas for the way forward:

  • Don't tile, but use global, spatially-chunked netCDF.
  • Use larger tiles, e.g. 90x90 will result in 8 tiles, 60 files.
  • Keep tile size, but aggregate months in 72 tiles, 72 files.

Probably a (yet another) separate issue. I like the third solution because (hypothetical, some-day) end-users would normally use data for all months on a given region. It would also pave the road towards smarter aggregators opening (or even downloading) only necessary files.

@juseg
Copy link
Owner Author

juseg commented Jul 12, 2024

Benchmarks on an preparing an Alps domain dataset. Averages of 5 calls, using dask.diagnostics for reprojections time.

  • Old 'chelsa' 12 global geotiffs: reproj 2 x 5 secs, total 17 secs.
  • New 'cw5e5' 864 monthly tiles: reproj 2 x 120 ms, total 9 secs.
  • New 'cera5' 72 yearlong tiles: reproj 2 x 300ms, total 6 secs.

Total times still include re-projecting the global netCDF topography, which seems to take most of 6 secs in the last case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CHELSA-ERA5 tiled climatologies.
1 participant