From 584f6eb0eb281f1cce452fe7c150c12a652ff57b Mon Sep 17 00:00:00 2001 From: Davis Bennett Date: Mon, 29 Jan 2024 15:52:59 +0100 Subject: [PATCH 1/4] docs: use 'ZIP archive' instead of 'zip file'; clarify utility of caching in s3 + ZIP example; style --- docs/tutorial.rst | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 351eef064a..1f7accab3a 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -774,7 +774,7 @@ the following code:: Any other compatible storage class could be used in place of :class:`zarr.storage.DirectoryStore` in the code examples above. For example, -here is an array stored directly into a Zip file, via the +here is an array stored directly into a ZIP archive, via the :class:`zarr.storage.ZipStore` class:: >>> store = zarr.ZipStore('data/example.zip', mode='w') @@ -798,12 +798,12 @@ Re-open and check that data have been written:: [42, 42, 42, ..., 42, 42, 42]], dtype=int32) >>> store.close() -Note that there are some limitations on how Zip files can be used, because items -within a Zip file cannot be updated in place. This means that data in the array +Note that there are some limitations on how ZIP archives can be used, because items +within a ZIP archive cannot be updated in place. This means that data in the array should only be written once and write operations should be aligned with chunk boundaries. Note also that the ``close()`` method must be called after writing any data to the store, otherwise essential records will not be written to the -underlying zip file. +underlying ZIP archive. Another storage alternative is the :class:`zarr.storage.DBMStore` class, added in Zarr version 2.2. This class allows any DBM-style database to be used for @@ -846,7 +846,7 @@ respectively require the `redis-py `_ and `pymongo `_ packages to be installed. For compatibility with the `N5 `_ data format, Zarr also provides -an N5 backend (this is currently an experimental feature). Similar to the zip storage class, an +an N5 backend (this is currently an experimental feature). Similar to the ZIP storage class, an :class:`zarr.n5.N5Store` can be instantiated directly:: >>> store = zarr.N5Store('data/example.n5') @@ -1000,12 +1000,13 @@ separately from Zarr. .. _tutorial_copy: -Accessing Zip Files on S3 -~~~~~~~~~~~~~~~~~~~~~~~~~ +Accessing ZIP archives on S3 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The built-in `ZipStore` will only work with paths on the local file-system, however -it is also possible to access ``.zarr.zip`` data on the cloud. Here is an example of -accessing a zipped Zarr file on s3: +The built-in :class:`zarr.storage.ZipStore` will only work with paths on the local file-system; however +it is possible to access ZIP-archived Zarr data on the cloud via the `ZipFileSystem `_ +class from ``fsspec``. The following example demonstrates how to access +a ZIP-archived Zarr group on s3 using `s3fs `_ and ``ZipFileSystem``: >>> s3_path = "s3://path/to/my.zarr.zip" >>> @@ -1014,7 +1015,7 @@ accessing a zipped Zarr file on s3: >>> fs = ZipFileSystem(f, mode="r") >>> store = FSMap("", fs, check=False) >>> - >>> # cache is optional, but may be a good idea depending on the situation + >>> # caching may improve performance when repeatedly reading the same data >>> cache = zarr.storage.LRUStoreCache(store, max_size=2**28) >>> z = zarr.group(store=cache) @@ -1022,7 +1023,7 @@ This store can also be generated with ``fsspec``'s handler chaining, like so: >>> store = zarr.storage.FSStore(url=f"zip::{s3_path}", mode="r") -This can be especially useful if you have a very large ``.zarr.zip`` file on s3 +This can be especially useful if you have a very large ZIP-archived Zarr array or group on s3 and only need to access a small portion of it. Consolidating metadata @@ -1161,7 +1162,7 @@ re-compression, and so should be faster. E.g.:: └── spam (100,) int64 >>> new_root['foo/bar/baz'][:] array([ 0, 1, 2, ..., 97, 98, 99]) - >>> store2.close() # zip stores need to be closed + >>> store2.close() # ZIP stores need to be closed .. _tutorial_strings: From dab605c68d088462e542f1cadd98ae608211375e Mon Sep 17 00:00:00 2001 From: Davis Bennett Date: Mon, 29 Jan 2024 16:23:34 +0100 Subject: [PATCH 2/4] docs: update release notes, correct spelling of greg lee's name in past release notes, and fix markup in past release notes --- docs/release.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/release.rst b/docs/release.rst index ab74a3debd..52dfdf202e 100644 --- a/docs/release.rst +++ b/docs/release.rst @@ -45,8 +45,8 @@ Docs * Minor tweak to advanced indexing tutorial examples. By :user:`Ross Barnowski ` :issue:`1550`. -* Added section about accessing zip files that are on s3. - By :user:`Jeff Peck ` :issue:`1613`. +* Added section about accessing ZIP archives on s3. + By :user:`Jeff Peck ` :issue:`1613`, :issue:`1615`, and :user:`Davis Bennett ` :issue:`1641`. Maintenance ~~~~~~~~~~~ @@ -117,10 +117,10 @@ Maintenance By :user:`Davis Bennett ` :issue:`1462`. * Style the codebase with ``ruff`` and ``black``. - By :user:`Davis Bennett` :issue:`1459` + By :user:`Davis Bennett ` :issue:`1459` * Ensure that chunks is tuple of ints upon array creation. - By :user:`Philipp Hanslovsky` :issue:`1461` + By :user:`Philipp Hanslovsky ` :issue:`1461` .. _release_2.15.0: @@ -508,7 +508,7 @@ Maintenance By :user:`Saransh Chopra ` :issue:`1079`. * Remove option to return None from _ensure_store. - By :user:`Greggory Lee ` :issue:`1068`. + By :user:`Gregory Lee ` :issue:`1068`. * Fix a typo of "integers". By :user:`Richard Scott ` :issue:`1056`. @@ -526,7 +526,7 @@ Enhancements Since the format is not yet finalized, the classes and functions are not automatically imported into the regular `zarr` name space. Setting the `ZARR_V3_EXPERIMENTAL_API` environment variable will activate them. - By :user:`Greggory Lee `; :issue:`898`, :issue:`1006`, and :issue:`1007` + By :user:`Gregory Lee `; :issue:`898`, :issue:`1006`, and :issue:`1007` as well as by :user:`Josh Moore ` :issue:`1032`. * **Create FSStore from an existing fsspec filesystem**. If you have created @@ -648,7 +648,7 @@ Enhancements higher-level array creation and convenience functions still accept plain Python dicts or other mutable mappings for the ``store`` argument, but will internally convert these to a ``KVStore``. - By :user:`Greggory Lee `; :issue:`839`, :issue:`789`, and :issue:`950`. + By :user:`Gregory Lee `; :issue:`839`, :issue:`789`, and :issue:`950`. * Allow to assign array ``fill_values`` and update metadata accordingly. By :user:`Ryan Abernathey `, :issue:`662`. @@ -795,7 +795,7 @@ Bug fixes ~~~~~~~~~ * Fix FSStore.listdir behavior for nested directories. - By :user:`Greggory Lee `; :issue:`802`. + By :user:`Gregory Lee `; :issue:`802`. .. _release_2.9.4: @@ -879,7 +879,7 @@ Bug fixes By :user:`Josh Moore `; :issue:`781`. * avoid NumPy 1.21.0 due to https://github.com/numpy/numpy/issues/19325 - By :user:`Greggory Lee `; :issue:`791`. + By :user:`Gregory Lee `; :issue:`791`. Maintenance ~~~~~~~~~~~ @@ -891,7 +891,7 @@ Maintenance By :user:`Elliott Sales de Andrade `; :issue:`799`. * TST: add missing assert in test_hexdigest. - By :user:`Greggory Lee `; :issue:`801`. + By :user:`Gregory Lee `; :issue:`801`. .. _release_2.8.3: From 94df4bb9ec8c0abbca71ab2e520d87d846620664 Mon Sep 17 00:00:00 2001 From: Davis Bennett Date: Mon, 29 Jan 2024 15:52:59 +0100 Subject: [PATCH 3/4] docs: use 'ZIP archive' instead of 'zip file'; clarify utility of caching in s3 + ZIP example; style --- docs/tutorial.rst | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 351eef064a..1f7accab3a 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -774,7 +774,7 @@ the following code:: Any other compatible storage class could be used in place of :class:`zarr.storage.DirectoryStore` in the code examples above. For example, -here is an array stored directly into a Zip file, via the +here is an array stored directly into a ZIP archive, via the :class:`zarr.storage.ZipStore` class:: >>> store = zarr.ZipStore('data/example.zip', mode='w') @@ -798,12 +798,12 @@ Re-open and check that data have been written:: [42, 42, 42, ..., 42, 42, 42]], dtype=int32) >>> store.close() -Note that there are some limitations on how Zip files can be used, because items -within a Zip file cannot be updated in place. This means that data in the array +Note that there are some limitations on how ZIP archives can be used, because items +within a ZIP archive cannot be updated in place. This means that data in the array should only be written once and write operations should be aligned with chunk boundaries. Note also that the ``close()`` method must be called after writing any data to the store, otherwise essential records will not be written to the -underlying zip file. +underlying ZIP archive. Another storage alternative is the :class:`zarr.storage.DBMStore` class, added in Zarr version 2.2. This class allows any DBM-style database to be used for @@ -846,7 +846,7 @@ respectively require the `redis-py `_ and `pymongo `_ packages to be installed. For compatibility with the `N5 `_ data format, Zarr also provides -an N5 backend (this is currently an experimental feature). Similar to the zip storage class, an +an N5 backend (this is currently an experimental feature). Similar to the ZIP storage class, an :class:`zarr.n5.N5Store` can be instantiated directly:: >>> store = zarr.N5Store('data/example.n5') @@ -1000,12 +1000,13 @@ separately from Zarr. .. _tutorial_copy: -Accessing Zip Files on S3 -~~~~~~~~~~~~~~~~~~~~~~~~~ +Accessing ZIP archives on S3 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The built-in `ZipStore` will only work with paths on the local file-system, however -it is also possible to access ``.zarr.zip`` data on the cloud. Here is an example of -accessing a zipped Zarr file on s3: +The built-in :class:`zarr.storage.ZipStore` will only work with paths on the local file-system; however +it is possible to access ZIP-archived Zarr data on the cloud via the `ZipFileSystem `_ +class from ``fsspec``. The following example demonstrates how to access +a ZIP-archived Zarr group on s3 using `s3fs `_ and ``ZipFileSystem``: >>> s3_path = "s3://path/to/my.zarr.zip" >>> @@ -1014,7 +1015,7 @@ accessing a zipped Zarr file on s3: >>> fs = ZipFileSystem(f, mode="r") >>> store = FSMap("", fs, check=False) >>> - >>> # cache is optional, but may be a good idea depending on the situation + >>> # caching may improve performance when repeatedly reading the same data >>> cache = zarr.storage.LRUStoreCache(store, max_size=2**28) >>> z = zarr.group(store=cache) @@ -1022,7 +1023,7 @@ This store can also be generated with ``fsspec``'s handler chaining, like so: >>> store = zarr.storage.FSStore(url=f"zip::{s3_path}", mode="r") -This can be especially useful if you have a very large ``.zarr.zip`` file on s3 +This can be especially useful if you have a very large ZIP-archived Zarr array or group on s3 and only need to access a small portion of it. Consolidating metadata @@ -1161,7 +1162,7 @@ re-compression, and so should be faster. E.g.:: └── spam (100,) int64 >>> new_root['foo/bar/baz'][:] array([ 0, 1, 2, ..., 97, 98, 99]) - >>> store2.close() # zip stores need to be closed + >>> store2.close() # ZIP stores need to be closed .. _tutorial_strings: From 9eff5b5e0726fabc66e7aa13bbdf68a52c8a72d6 Mon Sep 17 00:00:00 2001 From: Davis Bennett Date: Mon, 29 Jan 2024 16:23:34 +0100 Subject: [PATCH 4/4] docs: update release notes, correct spelling of greg lee's name in past release notes, and fix markup in past release notes --- docs/release.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/release.rst b/docs/release.rst index 0f199aadd2..b73dcec34f 100644 --- a/docs/release.rst +++ b/docs/release.rst @@ -62,8 +62,8 @@ Docs * Add Norman Rzepka to core-dev team. By :user:`Joe Hamman ` :issue:`1630`. -* Added section about accessing zip files that are on s3. - By :user:`Jeff Peck ` :issue:`1613`. +* Added section about accessing ZIP archives on s3. + By :user:`Jeff Peck ` :issue:`1613`, :issue:`1615`, and :user:`Davis Bennett ` :issue:`1641`. * Add V3 roadmap and design document. By :user:`Joe Hamman ` :issue:`1583`. @@ -157,10 +157,10 @@ Maintenance By :user:`Davis Bennett ` :issue:`1462`. * Style the codebase with ``ruff`` and ``black``. - By :user:`Davis Bennett` :issue:`1459` + By :user:`Davis Bennett ` :issue:`1459` * Ensure that chunks is tuple of ints upon array creation. - By :user:`Philipp Hanslovsky` :issue:`1461` + By :user:`Philipp Hanslovsky ` :issue:`1461` .. _release_2.15.0: @@ -548,7 +548,7 @@ Maintenance By :user:`Saransh Chopra ` :issue:`1079`. * Remove option to return None from _ensure_store. - By :user:`Greggory Lee ` :issue:`1068`. + By :user:`Gregory Lee ` :issue:`1068`. * Fix a typo of "integers". By :user:`Richard Scott ` :issue:`1056`. @@ -566,7 +566,7 @@ Enhancements Since the format is not yet finalized, the classes and functions are not automatically imported into the regular `zarr` name space. Setting the `ZARR_V3_EXPERIMENTAL_API` environment variable will activate them. - By :user:`Greggory Lee `; :issue:`898`, :issue:`1006`, and :issue:`1007` + By :user:`Gregory Lee `; :issue:`898`, :issue:`1006`, and :issue:`1007` as well as by :user:`Josh Moore ` :issue:`1032`. * **Create FSStore from an existing fsspec filesystem**. If you have created @@ -688,7 +688,7 @@ Enhancements higher-level array creation and convenience functions still accept plain Python dicts or other mutable mappings for the ``store`` argument, but will internally convert these to a ``KVStore``. - By :user:`Greggory Lee `; :issue:`839`, :issue:`789`, and :issue:`950`. + By :user:`Gregory Lee `; :issue:`839`, :issue:`789`, and :issue:`950`. * Allow to assign array ``fill_values`` and update metadata accordingly. By :user:`Ryan Abernathey `, :issue:`662`. @@ -835,7 +835,7 @@ Bug fixes ~~~~~~~~~ * Fix FSStore.listdir behavior for nested directories. - By :user:`Greggory Lee `; :issue:`802`. + By :user:`Gregory Lee `; :issue:`802`. .. _release_2.9.4: @@ -919,7 +919,7 @@ Bug fixes By :user:`Josh Moore `; :issue:`781`. * avoid NumPy 1.21.0 due to https://github.com/numpy/numpy/issues/19325 - By :user:`Greggory Lee `; :issue:`791`. + By :user:`Gregory Lee `; :issue:`791`. Maintenance ~~~~~~~~~~~ @@ -931,7 +931,7 @@ Maintenance By :user:`Elliott Sales de Andrade `; :issue:`799`. * TST: add missing assert in test_hexdigest. - By :user:`Greggory Lee `; :issue:`801`. + By :user:`Gregory Lee `; :issue:`801`. .. _release_2.8.3: