Skip to content

Commit

Permalink
Merge branch 'main' into kyle/object-store
Browse files Browse the repository at this point in the history
  • Loading branch information
kylebarron authored Feb 25, 2025
2 parents a0599b7 + 64b9a37 commit 8c3a6f2
Show file tree
Hide file tree
Showing 51 changed files with 2,782 additions and 601 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,11 @@ jobs:
- name: Run Tests
run: |
hatch env run --env ${{ matrix.dependency-set }} run
- name: Upload coverage
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
verbose: true # optional (default = false)

doctests:
name: doctests
Expand Down
1 change: 1 addition & 0 deletions changes/2665.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Adds functions for concurrently creating multiple arrays and groups.
3 changes: 0 additions & 3 deletions changes/2755.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2758.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2778.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2781.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2784.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2785.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2799.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2801.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2804.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2807.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2811.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2813.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion changes/2817.bugfix.rst

This file was deleted.

1 change: 1 addition & 0 deletions changes/2847.fix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed a bug where ``ArrayV2Metadata`` could save ``filters`` as an empty array.
1 change: 1 addition & 0 deletions changes/2851.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a bug when setting values of a smaller last chunk.
21 changes: 21 additions & 0 deletions docs/developers/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,27 @@ during development at `http://0.0.0.0:8000/ <http://0.0.0.0:8000/>`_. This can b

$ hatch --env docs run serve

.. _changelog:

Changelog
~~~~~~~~~

zarr-python uses `towncrier`_ to manage release notes. Most pull requests should
include at least one news fragment describing the changes. To add a release
note, you'll need the GitHub issue or pull request number and the type of your
change (``feature``, ``bugfix``, ``doc``, ``removal``, ``misc``). With that, run
```towncrier create``` with your development environment, which will prompt you
for the issue number, change type, and the news text::

towncrier create

Alternatively, you can manually create the files in the ``changes`` directory
using the naming convention ``{issue-number}.{change-type}.rst``.

See the `towncrier`_ docs for more.

.. _towncrier: https://towncrier.readthedocs.io/en/stable/tutorial.html

Development best practices, policies and procedures
---------------------------------------------------

Expand Down
22 changes: 22 additions & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,28 @@ Zarr allows you to create hierarchical groups, similar to directories::

This creates a group with two datasets: ``foo`` and ``bar``.

Batch Hierarchy Creation
~~~~~~~~~~~~~~~~~~~~~~~~

Zarr provides tools for creating a collection of arrays and groups with a single function call.
Suppose we want to copy existing groups and arrays into a new storage backend:

>>> # Create nested groups and add arrays
>>> root = zarr.group("data/example-3.zarr", attributes={'name': 'root'})
>>> foo = root.create_group(name="foo")
>>> bar = root.create_array(
... name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
... )
>>> nodes = {'': root.metadata} | {k: v.metadata for k,v in root.members()}
>>> print(nodes)
>>> from zarr.storage import MemoryStore
>>> new_nodes = dict(zarr.create_hierarchy(store=MemoryStore(), nodes=nodes))
>>> new_root = new_nodes['']
>>> assert new_root.attrs == root.attrs

Note that :func:`zarr.create_hierarchy` will only initialize arrays and groups -- copying array data must
be done in a separate step.

Persistent Storage
------------------

Expand Down
39 changes: 39 additions & 0 deletions docs/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,45 @@ Release notes

.. towncrier release notes start
3.0.3 (2025-02-14)
------------------

Features
~~~~~~~~

- Improves performance of FsspecStore.delete_dir for remote filesystems supporting concurrent/batched deletes, e.g., s3fs. (:issue:`2661`)
- Added :meth:`zarr.config.enable_gpu` to update Zarr's configuration to use GPUs. (:issue:`2751`)
- Avoid reading chunks during writes where possible. :issue:`757` (:issue:`2784`)
- :py:class:`LocalStore` learned to ``delete_dir``. This makes array and group deletes more efficient. (:issue:`2804`)
- Add `zarr.testing.strategies.array_metadata` to generate ArrayV2Metadata and ArrayV3Metadata instances. (:issue:`2813`)
- Add arbitrary `shards` to Hypothesis strategy for generating arrays. (:issue:`2822`)


Bugfixes
~~~~~~~~

- Fixed bug with Zarr using device memory, instead of host memory, for storing metadata when using GPUs. (:issue:`2751`)
- The array returned by ``zarr.empty`` and an empty ``zarr.core.buffer.cpu.NDBuffer`` will now be filled with the
specified fill value, or with zeros if no fill value is provided.
This fixes a bug where Zarr format 2 data with no fill value was written with un-predictable chunk sizes. (:issue:`2755`)
- Fix zip-store path checking for stores with directories listed as files. (:issue:`2758`)
- Use removeprefix rather than replace when removing filename prefixes in `FsspecStore.list` (:issue:`2778`)
- Enable automatic removal of `needs release notes` with labeler action (:issue:`2781`)
- Use the proper label config (:issue:`2785`)
- Alters the behavior of ``create_array`` to ensure that any groups implied by the array's name are created if they do not already exist. Also simplifies the type signature for any function that takes an ArrayConfig-like object. (:issue:`2795`)
- Enitialise empty chunks to the default fill value during writing and add default fill values for datetime, timedelta, structured, and other (void* fixed size) data types (:issue:`2799`)
- Ensure utf8 compliant strings are used to construct numpy arrays in property-based tests (:issue:`2801`)
- Fix pickling for ZipStore (:issue:`2807`)
- Update numcodecs to not overwrite codec configuration ever. Closes :issue:`2800`. (:issue:`2811`)
- Fix fancy indexing (e.g. arr[5, [0, 1]]) with the sharding codec (:issue:`2817`)


Improved Documentation
~~~~~~~~~~~~~~~~~~~~~~

- Added new user guide on :ref:`user-guide-gpu`. (:issue:`2751`)


3.0.2 (2025-01-31)
------------------

Expand Down
1 change: 1 addition & 0 deletions docs/user-guide/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Configuration options include the following:
- Whether empty chunks are written to storage ``array.write_empty_chunks``
- Async and threading options, e.g. ``async.concurrency`` and ``threading.max_workers``
- Selections of implementations of codecs, codec pipelines and buffers
- Enabling GPU support with ``zarr.config.enable_gpu()``. See :ref:`user-guide-gpu` for more.

For selecting custom implementations of codecs, pipelines, buffers and ndbuffers,
first register the implementations in the registry and then select them in the config.
Expand Down
37 changes: 37 additions & 0 deletions docs/user-guide/gpu.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
.. _user-guide-gpu:

Using GPUs with Zarr
====================

Zarr can use GPUs to accelerate your workload by running
:meth:`zarr.config.enable_gpu`.

.. note::

`zarr-python` currently supports reading the ndarray data into device (GPU)
memory as the final stage of the codec pipeline. Data will still be read into
or copied to host (CPU) memory for encoding and decoding.

In the future, codecs will be available compressing and decompressing data on
the GPU, avoiding the need to move data between the host and device for
compression and decompression.

Reading data into device memory
-------------------------------

:meth:`zarr.config.enable_gpu` configures Zarr to use GPU memory for the data
buffers used internally by Zarr.

.. code-block:: python
>>> import zarr
>>> import cupy as cp # doctest: +SKIP
>>> zarr.config.enable_gpu() # doctest: +SKIP
>>> store = zarr.storage.MemoryStore() # doctest: +SKIP
>>> z = zarr.create_array( # doctest: +SKIP
... store=store, shape=(100, 100), chunks=(10, 10), dtype="float32",
... )
>>> type(z[:10, :10]) # doctest: +SKIP
cupy.ndarray
Note that the output type is a ``cupy.ndarray`` rather than a NumPy array.
25 changes: 25 additions & 0 deletions docs/user-guide/groups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,31 @@ For more information on groups see the :class:`zarr.Group` API docs.

.. _user-guide-diagnostics:

Batch Group Creation
--------------------

You can also create multiple groups concurrently with a single function call. :func:`zarr.create_hierarchy` takes
a :class:`zarr.storage.Store` instance and a dict of ``key : metadata`` pairs, parses that dict, and
writes metadata documents to storage:

>>> from zarr import create_hierarchy
>>> from zarr.core.group import GroupMetadata
>>> from zarr.storage import LocalStore
>>> node_spec = {'a/b/c': GroupMetadata()}
>>> nodes_created = dict(create_hierarchy(store=LocalStore(root='data'), nodes=node_spec))
>>> print(sorted(nodes_created.items(), key=lambda kv: len(kv[0])))
[('', <Group file://data>), ('a', <Group file://data/a>), ('a/b', <Group file://data/a/b>), ('a/b/c', <Group file://data/a/b/c>)]

Note that we only specified a single group named ``a/b/c``, but 4 groups were created. These additional groups
were created to ensure that the desired node ``a/b/c`` is connected to the root group ``''`` by a sequence
of intermediate groups. :func:`zarr.create_hierarchy` normalizes the ``nodes`` keyword argument to
ensure that the resulting hierarchy is complete, i.e. all groups or arrays are connected to the root
of the hierarchy via intermediate groups.

Because :func:`zarr.create_hierarchy` concurrently creates metadata documents, it's more efficient
than repeated calls to :func:`create_group` or :func:`create_array`, provided you can statically define
the metadata for the groups and arrays you want to create.

Array and group diagnostics
---------------------------

Expand Down
1 change: 1 addition & 0 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Advanced Topics
performance
consolidated_metadata
extending
gpu


.. Coming soon
Expand Down
19 changes: 8 additions & 11 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -214,11 +214,7 @@ dependencies = [
'donfig @ git+https://github.com/pytroll/donfig',
'obstore @ git+https://github.com/developmentseed/obstore@main#subdirectory=obstore',
# test deps
'hypothesis',
'pytest',
'pytest-cov',
'pytest-asyncio',
'moto[s3]',
'zarr[test]',
]

[tool.hatch.envs.upstream.env-vars]
Expand All @@ -230,6 +226,9 @@ PIP_PRE = "1"
run = "pytest --verbose"
run-mypy = "mypy src"
run-hypothesis = "pytest --hypothesis-profile ci tests/test_properties.py tests/test_store/test_stateful*"
run-coverage = "pytest --cov-config=pyproject.toml --cov=pkg --cov-report xml --cov=src --junitxml=junit.xml -o junit_family=legacy"
run-coverage-gpu = "pip install cupy-cuda12x && pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov-report xml --cov=src --junitxml=junit.xml -o junit_family=legacy"
run-coverage-html = "pytest --cov-config=pyproject.toml --cov=pkg --cov-report html --cov=src"
list-env = "pip list"

[tool.hatch.envs.min_deps]
Expand All @@ -249,18 +248,16 @@ dependencies = [
'typing_extensions==4.9.*',
'donfig==0.8.*',
# test deps
'hypothesis',
'pytest',
'pytest-cov',
'pytest-asyncio',
'moto[s3]',
'zarr[test]',
]

[tool.hatch.envs.min_deps.scripts]
run = "pytest --verbose"
run-hypothesis = "pytest --hypothesis-profile ci tests/test_properties.py tests/test_store/test_stateful*"
list-env = "pip list"

run-coverage = "pytest --cov-config=pyproject.toml --cov=pkg --cov-report xml --cov=src --junitxml=junit.xml -o junit_family=legacy"
run-coverage-gpu = "pip install cupy-cuda12x && pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov-report xml --cov=src --junitxml=junit.xml -o junit_family=legacy"
run-coverage-html = "pytest --cov-config=pyproject.toml --cov=pkg --cov-report html --cov=src"

[tool.ruff]
line-length = 100
Expand Down
2 changes: 2 additions & 0 deletions src/zarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
create,
create_array,
create_group,
create_hierarchy,
empty,
empty_like,
full,
Expand Down Expand Up @@ -50,6 +51,7 @@
"create",
"create_array",
"create_group",
"create_hierarchy",
"empty",
"empty_like",
"full",
Expand Down
14 changes: 10 additions & 4 deletions src/zarr/api/asynchronous.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from typing_extensions import deprecated

from zarr.core.array import Array, AsyncArray, create_array, get_array_metadata
from zarr.core.array_spec import ArrayConfig, ArrayConfigLike
from zarr.core.array_spec import ArrayConfig, ArrayConfigLike, ArrayConfigParams
from zarr.core.buffer import NDArrayLike
from zarr.core.common import (
JSON,
Expand All @@ -23,7 +23,12 @@
_warn_write_empty_chunks_kwarg,
parse_dtype,
)
from zarr.core.group import AsyncGroup, ConsolidatedMetadata, GroupMetadata
from zarr.core.group import (
AsyncGroup,
ConsolidatedMetadata,
GroupMetadata,
create_hierarchy,
)
from zarr.core.metadata import ArrayMetadataDict, ArrayV2Metadata, ArrayV3Metadata
from zarr.core.metadata.v2 import _default_compressor, _default_filters
from zarr.errors import NodeTypeValidationError
Expand All @@ -48,6 +53,7 @@
"copy_store",
"create",
"create_array",
"create_hierarchy",
"empty",
"empty_like",
"full",
Expand Down Expand Up @@ -856,7 +862,7 @@ async def create(
codecs: Iterable[Codec | dict[str, JSON]] | None = None,
dimension_names: Iterable[str] | None = None,
storage_options: dict[str, Any] | None = None,
config: ArrayConfig | ArrayConfigLike | None = None,
config: ArrayConfigLike | None = None,
**kwargs: Any,
) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
"""Create an array.
Expand Down Expand Up @@ -1018,7 +1024,7 @@ async def create(
mode = "a"
store_path = await make_store_path(store, path=path, mode=mode, storage_options=storage_options)

config_dict: ArrayConfigLike = {}
config_dict: ArrayConfigParams = {}

if write_empty_chunks is not None:
if config is not None:
Expand Down
Loading

0 comments on commit 8c3a6f2

Please sign in to comment.