-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating fill_value logic for HDFVirtualBackend #414
Comments
I'm forgetting most of the details, but IIRC the biggest problem for xarray was it layering on additional meaning to LMK if there's any specifics I can help out with. |
This seems to be what is going on with the
I think we do want to specify custom logic in
My understanding is that virtualizarr is intended to be able to write Zarr v2 indefinitely. It will just be using zarr-python v3 to do so.
I have some additional failures for you in #422 :) |
I'm working through trying to update the "fill value" (quoted to denote the massive ambiguity around this loaded term as described by @rabernat in pydata/xarray#5475 (comment)) for the
HDFVirtualBackend
but it seems this will be a multi-level issue. My hope is our logic can follow the interpretation that @rabernat set out in pydata/xarray#5475 (comment) so that the reader will use the HDF datasetfillvalue
property for the the Zarrfill_value
and if the HDF dataset has a_FillValue
attr
that will be preserved as an arrayattr
for use by xarray's decoding logic to represent masked values. This is a transition from the the oldkerchunk
logic that was discussed and updated in fsspec/kerchunk#177.This presents a bit of a chicken and egg problem for our transition to Zarr V3 #392 and specifically our transition to
ArrayV3Metadata
#411 as an internal representation. As outlined by @rabernat, Zarr v2 uses different semantics forfill_value
and the spec is less rigorous about its encoding. The Zarr V3 specification has adopted @rabernat's new semantics and has more detailed guidelines for fill_value encoding.IIUC as we transition the
HDFVirtualBackend
to use the new Zarr v3fill_value
semantics and encoding ourkerchunk
based roundtrip tests will be broken unless we add specific logic to ourto_kerchunk
serialization that transform v3 -> v2 semantics. I have not searched the new kerchunk v3 compatibility PR fsspec/kerchunk#516 (comment) exhaustively but it appears that the semantic fill_value logic used by the HDF reader has not changed and is still using the v2 logic so even serializing to v3 compatible kerchunk dictionaries would fail equivalence assertions.I'll be working through a PR to update the HDFVirtualBackend semantics in our current codebase and I'll assess the extent of test failures and how they can be addressed.
Looking through all of the Zarr v3 PRs it looks like @TomAugspurger has been tackling a lot of the
fill_value
updates. Maybe he can weigh in here with some advice on the best path forward to adhere to the correct v3 behavior while still maintaining v2 semantic compatibility during our transition period.The text was updated successfully, but these errors were encountered: