Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GES DISC Use Case Exploration for Icechunk #16

Closed
abarciauskas-bgse opened this issue Oct 28, 2024 · 1 comment
Closed

GES DISC Use Case Exploration for Icechunk #16

abarciauskas-bgse opened this issue Oct 28, 2024 · 1 comment
Assignees

Comments

@abarciauskas-bgse
Copy link
Contributor

This ticket is meant to track our ongoing investigation of th utility of icechunk at GES DISC. We have had a number of conversations about zarr virtualization with GES DISC folks (most notably Christine and Brianna, but also Lucas Sterzinger and Maha Hegde).

Objective

We have 2 objectives with this work:

  1. Demonstrate icechunk's utility for NASA: We aim to showcase Icechunk's utility across NASA by identifying a high-impact use case within GES DISC, specifically focusing on datasets that could benefit from virtualization and accessibility as Zarr/ARCO formats. These datasets may serve Giovanni users or other applications, particularly where maintaining accessibility has been challenging.
  2. Determine what are the challenges and limitations in (1): Are there ways in which icechunk is still not useful or usable for NASA? What can we do to address those challenges?

Meetings

We met with Christine, Brianna and Hegde on Wednesday, October 23 and plan to meet with them every 3 weeks through the rest of this year (which is only 2 more meetings 🙀 )

Meeting Notes

  • Christine/GES DISC:

    • Uses Zarr stores but hesitates to make them public due to update/appending issues.
    • Interested in Icechunk but notes GES DISC currently uses a single-writer, multi-reader model with long chunking in time.
    • Seems like the most major concern is about storage growth - right now icechunk would maintain copies of chunks rather than doing chunk diffs. A huge number of chunk copies may be generated as they regularly append to the same chunk.
    • It was noted that garbage collection is on icechunk's roadmap.
    • Highlights cases like GPM IMERG, popular analysis workflows (e.g., time-averaged maps, area-averaged time series), and challenges with non-time-dimensional data (e.g., AIRS3STD).
    • Sounded like they are also planning on an implementation of lakefs but not until Q2 next year (not sure if this is fiscal or calendar year Q2).
  • Sean:

    • Notes Icechunk’s current handling of chunk updates (whole chunk copies rather than diffs).
    • Suggests data batching based on acceptable data latency.

Potential Use Cases

  • GPM IMERG (Near Real-Time): Update frequency poses a challenge; data must be available near real-time.
  • MERRA and Hydrology Data Rods: Popular data with challenging metadata for OpenDAP or THREDDS emulation.
  • AIRS3STD (HDF4): Example of non-temporal data needing time dimension insertion.
  • Data Aggregation Needs: Use cases include OpenDAP and metadata challenges for Giovanni.

Action Items

  • MERRA Product Details: Identify available products under TDS (contact: maha.hegde@nasa.gov).
  • Hydrology Data Rods Details: Explore relevant information (contact: christine.e.smit@nasa.gov).
  • Project Kickoff: Aimee Barciauskas and Sean Harkins will review use cases and create a diagram explaining GES DISC goals and Icechunk's integration.

FYI @sharkinsspatial @maxrjones @hrodmn

@abarciauskas-bgse abarciauskas-bgse self-assigned this Oct 28, 2024
@abarciauskas-bgse
Copy link
Contributor Author

I think we will continue to communicate on the use case and solution with GES DISC, but calling this initial exploration closed via earth-mover/icechunk-nasa#1 and earth-mover/icechunk-nasa#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant