Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tutorial.rst to include section about accessing Zip Files on S3 #1615

Merged
merged 4 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ Docs
* Minor tweak to advanced indexing tutorial examples.
By :user:`Ross Barnowski <rossbar>` :issue:`1550`.

* Added section about accessing zip files that are on s3.
By :user:`Jeff Peck <jeffpeck10x>` :issue:`1613`.

Maintenance
~~~~~~~~~~~
Expand Down
25 changes: 25 additions & 0 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1000,6 +1000,31 @@ separately from Zarr.

.. _tutorial_copy:

Accessing Zip Files on S3
~~~~~~~~~~~~~~~~~~~~~~~~~

The built-in `ZipStore` will only work with paths on the local file-system, however
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how links work in sphinx, but could we make this a link to the API docs for ZipStore?

it is also possible to access ``.zarr.zip`` data on the cloud. Here is an example of
accessing a zipped Zarr file on s3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer "zipped Zarr hierarchy" to "zipped Zarr file"


>>> s3_path = "s3://path/to/my.zarr.zip"
>>>
>>> s3 = s3fs.S3FileSystem()
>>> f = s3.open(s3_path)
>>> fs = ZipFileSystem(f, mode="r")
>>> store = FSMap("", fs, check=False)
>>>
>>> # cache is optional, but may be a good idea depending on the situation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which situations benefit from the cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to access the same chunks of data multiple times

>>> cache = zarr.storage.LRUStoreCache(store, max_size=2**28)
>>> z = zarr.group(store=cache)

This store can also be generated with ``fsspec``'s handler chaining, like so:

>>> store = zarr.storage.FSStore(url=f"zip::{s3_path}", mode="r")

This can be especially useful if you have a very large ``.zarr.zip`` file on s3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above -- let's replace zarr.zip with "zipped Zarr hierarchy"

and only need to access a small portion of it.

Consolidating metadata
~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Loading