Manifest arrays use arrayv3metadata #429

abarciauskas-bgse · 2025-02-06T03:30:02Z

This is still very much a WIP - many tests and implementations still need to be fixed.

A few notes:

It was suggested we remove ZArray completely as a part of this work, as opposed to using a conversion function for ZArrays to ArrayV3Metadata. So we should be able to remove ZArray as a part of this pr.
It was suggested not to use zarr's _parse_chunk_encoding_v3 function since it is a private function and may change, which is why some of that logic is replicated in convert_to_codec_pipeline

Checklist

virtualizarr/manifests/array.py

virtualizarr/manifests/array_api.py

…not happy about this)

virtualizarr/writers/kerchunk.py

virtualizarr/zarr.py

abarciauskas-bgse · 2025-02-12T23:56:52Z

I'm going to continue to review this tomorrow but the tests are passing and I've done an initial reorganization of the code that was in zarr.py. So if any of @TomNicholas @norlandrhagen @ayushnag @sharkinsspatial @jsignell @mpiannucci want to start to review please go ahead 🙏🏽

I also changed the base to a new branch of main zarr-python-3.0,

TomNicholas

This is absolutely great @abarciauskas-bgse ! Comments are really just minor.

ci/upstream.yml

TomNicholas · 2025-02-13T03:26:57Z

conftest.py

+@pytest.fixture
+def array_v3_metadata():


Could we reimplement this fixture to internally just call the array_v3_metadata_dict fixture below?

I did this in 0712979 - I do like just having one method to create array v3 metadata for tests however it does mean tests have to be a bit more verbose as every codecs argument must include an ArrayBytesCodec (which is always {"name": "bytes", "configuration": {"endian": "little"}}).

But I'll think on ways to streamline this more...

have a look at the signature of this function, which has a lot of sane defaults, and which works for v2 and v3 metadata: https://github.com/zarr-developers/zarr-python/blob/99621ecf0b81400e323828111363fe21cf0c7592/src/zarr/core/array.py#L4008-L4030. I think we could consider adding an ArrayV3Metadata.build method that has a signature like this, which should make creating metadata documents a lot easier.

I revisited this upon also building a utils method for create_v3_array_metadata that readers can use. Now the fixture just calls that function and I removed the array_v3_metadata_dict altogether (and call to_dict in tests where I want to test the construction from dict functionality).

docs/usage.md

virtualizarr/codecs.py

TomNicholas · 2025-02-13T03:35:46Z

virtualizarr/codecs.py

+def extract_codecs(
+    codecs: CodecPipeline,
+) -> tuple[
+    tuple[ArrayArrayCodec, ...], ArrayBytesCodec | None, tuple[BytesBytesCodec, ...]


This is a pretty complicated type that I had to stare at to work out what it is. Use TypeAlias with an informative name?

Also is it definitely the right type? Seems weird that this would be valid: ((,), None, (,))

I think it is correct, I have added a type alias, but essentially we just need a way of extracting out the various codec types so we can insert an ArrayBytesCodec (when there is none, which is always in this codebase) in-between ArrayArrayCodecs and BytesBytesCodecs

virtualizarr/tests/test_codecs.py

virtualizarr/tests/test_manifests/test_array.py

virtualizarr/translators/kerchunk.py

TomNicholas · 2025-02-13T04:27:32Z

virtualizarr/writers/icechunk.py

-        codecs = zarray._v3_codecs()
-
-        # create array if it doesn't already exist
+        # TODO: Should codecs be an argument to zarr's AsyncrGroup.create_array?


You're asking for an upstream change here right?

TomNicholas · 2025-02-13T04:28:55Z

virtualizarr/writers/kerchunk.py

@@ -67,13 +90,65 @@ def remove_file_uri_prefix(path: str):
        return path


+def convert_v3_to_v2_metadata(


Should this (a) not live in the kerchunk-specific writer module, (b) actually live in zarr-python upstream? Or is there no use for it upstream?

abarciauskas-bgse · 2025-02-13T17:01:13Z

docs/usage.md

+                chunk_grid=RegularChunkGrid(chunk_shape=(2920, 25, 53)),
+                chunk_key_encoding=DefaultChunkKeyEncoding(name='default',
+                                                           separator='/'),
+                fill_value=np.float64(-327.67),


Seeing the full representation I looked into why the fill value was np.float64(-327.67) as that fill value was not in the xarray encoding but was the fill value of the h5py.Dataset, is it worth digging into why that float value wasn't in the xarray encoding? @sharkinsspatial

Discussing this in a separate thread, will follow up once resolved...

Co-authored-by: Tom Nicholas <tom@cworthy.org>

for more information, see https://pre-commit.ci

Co-authored-by: Tom Nicholas <tom@cworthy.org>

for more information, see https://pre-commit.ci

…delta data types

abarciauskas-bgse added 10 commits February 4, 2025 14:38

Added zarray_to_v3metadata and test

2a01bfa

Working on manifest array tests

17fd547

Fix test_manifests/test_array#TestConcat tests

e5666ab

Passing TestStack tests and add fixture

5a8cc4c

All test_manifests/test_array tests passing

4c0b616

Compressors should be list

ac2f787

Passing dmrpp tests

5503c60

Merge branch 'main' into manifest-arrays-use-arrayv3metadata

1272051

Passing test_hdf.py tests

1f36755

Start to work on kerchunk tests

7098803

abarciauskas-bgse had a problem deploying to test-release February 6, 2025 03:30 — with GitHub Actions Failure

abarciauskas-bgse mentioned this pull request Feb 6, 2025

ManifestArray should use zarr-python's ArrayV3Metadata #424

Open

TomNicholas reviewed Feb 6, 2025

View reviewed changes

virtualizarr/manifests/array.py Outdated Show resolved Hide resolved

virtualizarr/manifests/array.py Outdated Show resolved Hide resolved

virtualizarr/manifests/array_api.py Outdated Show resolved Hide resolved

TomNicholas added zarr-python Relevant to zarr-python upstream internals labels Feb 6, 2025

Add method to convert array v3 metadata to v2 metadata for kerchunk (…

ce2284c

…not happy about this)

abarciauskas-bgse had a problem deploying to test-release February 7, 2025 01:11 — with GitHub Actions Failure

abarciauskas-bgse commented Feb 7, 2025

View reviewed changes

virtualizarr/writers/kerchunk.py Show resolved Hide resolved

Fix fixtures and mark xfail netcdf3

c9853d5

abarciauskas-bgse had a problem deploying to test-release February 7, 2025 01:20 — with GitHub Actions Failure

Test for convert_v3_to_v2_metadata

209dae3

abarciauskas-bgse had a problem deploying to test-release February 7, 2025 15:19 — with GitHub Actions Failure

Deduplicate fixture for array v3 metadata

e7205ef

abarciauskas-bgse had a problem deploying to test-release February 7, 2025 15:22 — with GitHub Actions Failure

Parse filters and compressors from v3 metdata for v2 metadata

d65e457

abarciauskas-bgse had a problem deploying to test-release February 7, 2025 15:45 — with GitHub Actions Failure

Rewrite extract_codecs

190c20f

abarciauskas-bgse had a problem deploying to test-release February 7, 2025 20:16 — with GitHub Actions Failure

abarciauskas-bgse commented Feb 7, 2025

View reviewed changes

virtualizarr/zarr.py Outdated Show resolved Hide resolved

Refactor convert_to_codec_pipeline

47f5ddd

abarciauskas-bgse temporarily deployed to test-release February 12, 2025 23:03 — with GitHub Actions Inactive

Move some imports and make update_metadata a private method

bcd68a0

abarciauskas-bgse temporarily deployed to test-release February 12, 2025 23:15 — with GitHub Actions Inactive

Remove zarr.py

f0ce778

abarciauskas-bgse temporarily deployed to test-release February 12, 2025 23:50 — with GitHub Actions Inactive

abarciauskas-bgse marked this pull request as ready for review February 12, 2025 23:54

abarciauskas-bgse changed the base branch from main to zarr-python-3.0 February 12, 2025 23:58

TomNicholas reviewed Feb 13, 2025

View reviewed changes

abarciauskas-bgse added 2 commits February 13, 2025 08:09

Add zarr to other ci env files

0518488

Fixture array_v3_metadata uses array_v3_metadata_dict

0712979

abarciauskas-bgse commented Feb 13, 2025

View reviewed changes

abarciauskas-bgse and others added 15 commits February 13, 2025 09:03

No need for union type for CodecPipeline

c40915d

Use type alias

cdaca53

Add comment

2415e07

Update virtualizarr/manifests/array_api.py

9366d69

Co-authored-by: Tom Nicholas <tom@cworthy.org>

[pre-commit.ci] auto fixes from pre-commit.com hooks

d590cfc

for more information, see https://pre-commit.ci

Revised copy_and_replace_metadata to be in utils and called correctly

6394207

Update virtualizarr/translators/kerchunk.py

ea9fd56

Co-authored-by: Tom Nicholas <tom@cworthy.org>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ee2b48

for more information, see https://pre-commit.ci

Refactor create v3 array metadata

86d1de5

Rename to create_v3_array_metadata

fe8305f

Fix some codecs fixtures

0f5b32d

Use global vars and simple fixture for creating codec pipelines

97bc7cd

Remove redundant create_codec_pipeline fixture

b5a1dc6

Fix docstring

12c6260

Use create_v3_array_metadata in from_kerchunk_refs

4b555b6

TomNicholas mentioned this pull request Feb 14, 2025

TypeError: no implementation found for 'numpy.concatenate' on types that implement __array_function__ #433

Open

abarciauskas-bgse added 2 commits February 14, 2025 10:12

Add links to zarr-python 3.0 issues for big endian, datetime and time…

c245b0a

…delta data types

Reorganize conftest

23ac776

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manifest arrays use arrayv3metadata #429

Manifest arrays use arrayv3metadata #429

abarciauskas-bgse commented Feb 6, 2025 •

edited

Loading

abarciauskas-bgse commented Feb 12, 2025 •

edited

Loading

TomNicholas left a comment

TomNicholas Feb 13, 2025

abarciauskas-bgse Feb 13, 2025

d-v-b Feb 13, 2025

abarciauskas-bgse Feb 13, 2025

TomNicholas Feb 13, 2025

abarciauskas-bgse Feb 13, 2025

TomNicholas Feb 13, 2025

TomNicholas Feb 13, 2025

abarciauskas-bgse Feb 13, 2025

abarciauskas-bgse Feb 14, 2025

		@@ -67,13 +90,65 @@ def remove_file_uri_prefix(path: str):
		return path


		def convert_v3_to_v2_metadata(

Manifest arrays use arrayv3metadata #429

Are you sure you want to change the base?

Manifest arrays use arrayv3metadata #429

Conversation

abarciauskas-bgse commented Feb 6, 2025 • edited Loading

Checklist

abarciauskas-bgse commented Feb 12, 2025 • edited Loading

TomNicholas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abarciauskas-bgse commented Feb 6, 2025 •

edited

Loading

abarciauskas-bgse commented Feb 12, 2025 •

edited

Loading