Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/batch creation #2665

Merged
merged 84 commits into from
Feb 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
8faf994
sketch out batch creation routine
d-v-b Dec 11, 2024
8952911
scratch state of easy batch creation
d-v-b Dec 18, 2024
de3c594
Merge branch 'main' of https://github.com/d-v-b/zarr-python into feat…
d-v-b Jan 1, 2025
c700e39
rename tupleize keys
d-v-b Jan 3, 2025
986d68b
Merge branch 'main' of https://github.com/zarr-developers/zarr-python…
d-v-b Jan 3, 2025
97b768f
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Jan 7, 2025
b6bf2dd
Merge branch 'feat/batch-creation' of github.com:d-v-b/zarr-python in…
d-v-b Jan 7, 2025
57ceb64
tests and proper implementation for create_nodes and create_hierarchy
d-v-b Jan 7, 2025
181d3d0
privatize
d-v-b Jan 7, 2025
e8e6107
use Posixpath instead of Path in tests; avoid redundant cast
d-v-b Jan 7, 2025
4f2c954
restore cast
d-v-b Jan 7, 2025
dd4174c
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Jan 7, 2025
cf72834
pureposixpath instead of posixpath
d-v-b Jan 7, 2025
e2cff8c
group-level create_hierarchy
d-v-b Jan 7, 2025
0912ecb
docstring
d-v-b Jan 7, 2025
04f7922
Merge branch 'main' of https://github.com/zarr-developers/zarr-python…
d-v-b Jan 8, 2025
089feef
sketch out from_flat for groups
d-v-b Jan 8, 2025
116ab87
better concurrency for v2
d-v-b Jan 9, 2025
246f862
Merge branch 'main' of https://github.com/zarr-developers/zarr-python…
d-v-b Jan 9, 2025
e38c1ca
revert change to default concurrency
d-v-b Jan 9, 2025
2fb9083
create root correctly
d-v-b Jan 9, 2025
b099fba
working _from_flat
d-v-b Jan 10, 2025
64b54bf
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Jan 10, 2025
4562e86
working dict serialization for _ImplicitGroupMetadata
d-v-b Jan 10, 2025
cdfd5de
remove implicit group metadata, and add some key name normalization
d-v-b Jan 15, 2025
036fd2a
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Jan 15, 2025
787d6bf
add path normalization routines
d-v-b Jan 22, 2025
d07435b
use _join_paths for safer path concatenation
d-v-b Jan 22, 2025
29ecce7
Merge branch 'feat/batch-creation' of github.com:d-v-b/zarr-python in…
d-v-b Jan 22, 2025
63dd07f
handle overwrite
d-v-b Jan 22, 2025
15c4a7e
rename _from_flat to _create_rooted_hierarchy, add sync version
d-v-b Jan 22, 2025
645a447
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Jan 22, 2025
bd9afd1
add test for _create_rooted_hierarchy when the output should be an ar…
d-v-b Jan 22, 2025
8be3876
increase coverage, one way or another
d-v-b Jan 22, 2025
06e5482
remove replace kwarg for _set_return_key
d-v-b Jan 22, 2025
37186d6
shield lines from coverage
d-v-b Jan 22, 2025
ed4e846
add some tests
d-v-b Jan 22, 2025
02ac91d
lint
d-v-b Jan 22, 2025
f6a08a0
improve coverage with more tests
d-v-b Jan 22, 2025
9d2f642
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Jan 22, 2025
ed0d52a
Merge branch 'main' into feat/batch-creation
d-v-b Jan 25, 2025
661678f
use store + path instead of StorePath for hierarchy api
d-v-b Jan 28, 2025
7a718d5
docstrings
d-v-b Jan 28, 2025
23bfef5
docstrings
d-v-b Jan 28, 2025
619eeb5
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Jan 28, 2025
5282534
release notes
d-v-b Jan 28, 2025
6507e43
refactor sync / async functions, and make tests more compact accordingly
d-v-b Jan 28, 2025
6b56342
keyerror -> filenotfounderror
d-v-b Jan 28, 2025
3be878d
keyerror -> filenotfounderror, fixup
d-v-b Jan 28, 2025
774eeda
Merge branch 'main' into feat/batch-creation
d-v-b Jan 28, 2025
f3c506f
add top-level exports
d-v-b Jan 28, 2025
60379a7
Merge branch 'feat/batch-creation' of github.com:d-v-b/zarr-python in…
d-v-b Jan 28, 2025
32e06fa
mildly refactor node input validation
d-v-b Jan 29, 2025
8bd0b57
simplify path normalization
d-v-b Jan 29, 2025
1bb6578
Merge branch 'main' of https://github.com/zarr-developers/zarr-python…
d-v-b Feb 2, 2025
d05a43c
refactor to separate sync and async routines
d-v-b Feb 2, 2025
29bab74
remove semaphore kwarg, and add test for concurrency limit sensitivity
d-v-b Feb 2, 2025
2f02c26
wire up semaphore correctly, thanks to a test
d-v-b Feb 2, 2025
6ab8339
export read_node
d-v-b Feb 2, 2025
9b97c95
docstrings
d-v-b Feb 2, 2025
e546519
docstrings
d-v-b Feb 2, 2025
24eab3a
read_node -> get_node
d-v-b Feb 2, 2025
2b02996
Merge branch 'main' into feat/batch-creation
d-v-b Feb 7, 2025
a1e75b9
Merge branch 'main' into feat/batch-creation
d-v-b Feb 10, 2025
fff280c
Merge branch 'main' into feat/batch-creation
d-v-b Feb 11, 2025
545cacb
Update src/zarr/api/synchronous.py
d-v-b Feb 12, 2025
39c8a68
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Feb 13, 2025
438780b
update docstring
d-v-b Feb 13, 2025
afe47cd
add function signature tests
d-v-b Feb 13, 2025
a2547b3
update exception name
d-v-b Feb 13, 2025
9f0ccfa
refactor: remove path kwarg, bring back ImplicitGroupMetadata
d-v-b Feb 14, 2025
42b9804
prune top-level synchronous API
d-v-b Feb 14, 2025
d7d0070
more api pruning
d-v-b Feb 14, 2025
afdc320
put sync wrappers in sync_group module, move utils to utils
d-v-b Feb 14, 2025
e74445b
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Feb 14, 2025
50b02b4
ensure we always have a root group
d-v-b Feb 17, 2025
fdc1c8f
Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…
d-v-b Feb 17, 2025
7c56b87
docs
d-v-b Feb 18, 2025
8245e80
fix group.create_hierarchy to properly prefix keys with the name of t…
d-v-b Feb 18, 2025
df2bdc6
docstrings
d-v-b Feb 18, 2025
35afe7f
docstrings
d-v-b Feb 18, 2025
77264e4
docstring examples
d-v-b Feb 18, 2025
3bf83ad
Merge branch 'main' into feat/batch-creation
d-v-b Feb 21, 2025
11e3fa1
Merge branch 'main' into feat/batch-creation
d-v-b Feb 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/2665.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Adds functions for concurrently creating multiple arrays and groups.
22 changes: 22 additions & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,28 @@ Zarr allows you to create hierarchical groups, similar to directories::

This creates a group with two datasets: ``foo`` and ``bar``.

Batch Hierarchy Creation
~~~~~~~~~~~~~~~~~~~~~~~~

Zarr provides tools for creating a collection of arrays and groups with a single function call.
Suppose we want to copy existing groups and arrays into a new storage backend:

>>> # Create nested groups and add arrays
>>> root = zarr.group("data/example-3.zarr", attributes={'name': 'root'})
>>> foo = root.create_group(name="foo")
>>> bar = root.create_array(
... name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
... )
>>> nodes = {'': root.metadata} | {k: v.metadata for k,v in root.members()}
>>> print(nodes)
>>> from zarr.storage import MemoryStore
>>> new_nodes = dict(zarr.create_hierarchy(store=MemoryStore(), nodes=nodes))
>>> new_root = new_nodes['']
>>> assert new_root.attrs == root.attrs

Note that :func:`zarr.create_hierarchy` will only initialize arrays and groups -- copying array data must
be done in a separate step.

Persistent Storage
------------------

Expand Down
25 changes: 25 additions & 0 deletions docs/user-guide/groups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,31 @@ For more information on groups see the :class:`zarr.Group` API docs.

.. _user-guide-diagnostics:

Batch Group Creation
--------------------

You can also create multiple groups concurrently with a single function call. :func:`zarr.create_hierarchy` takes
a :class:`zarr.storage.Store` instance and a dict of ``key : metadata`` pairs, parses that dict, and
writes metadata documents to storage:

>>> from zarr import create_hierarchy
>>> from zarr.core.group import GroupMetadata
>>> from zarr.storage import LocalStore
>>> node_spec = {'a/b/c': GroupMetadata()}
>>> nodes_created = dict(create_hierarchy(store=LocalStore(root='data'), nodes=node_spec))
>>> print(sorted(nodes_created.items(), key=lambda kv: len(kv[0])))
[('', <Group file://data>), ('a', <Group file://data/a>), ('a/b', <Group file://data/a/b>), ('a/b/c', <Group file://data/a/b/c>)]

Note that we only specified a single group named ``a/b/c``, but 4 groups were created. These additional groups
were created to ensure that the desired node ``a/b/c`` is connected to the root group ``''`` by a sequence
of intermediate groups. :func:`zarr.create_hierarchy` normalizes the ``nodes`` keyword argument to
ensure that the resulting hierarchy is complete, i.e. all groups or arrays are connected to the root
of the hierarchy via intermediate groups.

Because :func:`zarr.create_hierarchy` concurrently creates metadata documents, it's more efficient
than repeated calls to :func:`create_group` or :func:`create_array`, provided you can statically define
the metadata for the groups and arrays you want to create.

Array and group diagnostics
---------------------------

Expand Down
2 changes: 2 additions & 0 deletions src/zarr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
create,
create_array,
create_group,
create_hierarchy,
empty,
empty_like,
full,
Expand Down Expand Up @@ -50,6 +51,7 @@
"create",
"create_array",
"create_group",
"create_hierarchy",
"empty",
"empty_like",
"full",
Expand Down
8 changes: 7 additions & 1 deletion src/zarr/api/asynchronous.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,12 @@
_warn_write_empty_chunks_kwarg,
parse_dtype,
)
from zarr.core.group import AsyncGroup, ConsolidatedMetadata, GroupMetadata
from zarr.core.group import (
AsyncGroup,
ConsolidatedMetadata,
GroupMetadata,
create_hierarchy,
)
from zarr.core.metadata import ArrayMetadataDict, ArrayV2Metadata, ArrayV3Metadata
from zarr.core.metadata.v2 import _default_compressor, _default_filters
from zarr.errors import NodeTypeValidationError
Expand All @@ -48,6 +53,7 @@
"copy_store",
"create",
"create_array",
"create_hierarchy",
"empty",
"empty_like",
"full",
Expand Down
2 changes: 2 additions & 0 deletions src/zarr/api/synchronous.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from zarr.core.array import Array, AsyncArray
from zarr.core.group import Group
from zarr.core.sync import sync
from zarr.core.sync_group import create_hierarchy

if TYPE_CHECKING:
from collections.abc import Iterable
Expand Down Expand Up @@ -46,6 +47,7 @@
"copy_store",
"create",
"create_array",
"create_hierarchy",
"empty",
"empty_like",
"full",
Expand Down
Loading