-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_icechunk
doesn't have a group
option
#341
Comments
Right. I think this basically only requires passing the VirtualiZarr/virtualizarr/writers/icechunk.py Line 274 in fcdd5e4
cc @mpiannucci |
I'd like to take this one, if nobody else has claimed it. |
How would you like to see the def dataset_to_icechunk(
ds: Dataset,
store: IcechunkStore,
append_dim: Optional[str] = None,
last_updated_at: Optional[datetime] = None,
) -> None: Offhand it seems like making |
Does this mean that we simply want to define the parameter to |
Yep! Generally we want to make this as similar as possible to xarray.Dataset.to_zarr(). I agree about kw-only args. We're not too worried about backwards-compatibility at this point as the library is still new and evolving. |
Do we want to create the group if it does not already exist, or raise an error in that case? |
Ideally do whatever xarray.to_zarr does if the group does not already exist. |
for reference, if dataset_to_icechunk(ds, store, group=...) (so you don't have to make it a keyword-only parameter). That said, keyword-only parameters allow making API use more uniform, which especially for optional parameters really improves the readability of any function calls (in my opinion, at least). |
Thank you. I realized that later (🤦), especially after I made Given that, and given that @TomNicholas indicated that this function should be "as similar as possible to xarray.Dataset.to_zarr()," here is the signature for that function: Dataset.to_zarr(store=None, chunk_store=None, mode=None, synchronizer=None,
group=None, encoding=None, *, compute=True, consolidated=None, append_dim=None,
region=None, safe_chunks=True, storage_options=None, zarr_version=None,
zarr_format=None, write_empty_chunks=None, chunkmanager_store_kwargs=None) So shall I make dataset_to_icechunk(
ds: Dataset,
store: IcechunkStore,
group: Group = None,
*,
append_dim: Optional[str] = None,
last_updated_at: Optional[datetime] = None,
) -> None: This would make the existing optional parameters keyword-only, but make the new, optional Or shall I make |
I personally would also make dataset_to_icechunk(ds, store, group=x) is much harder to misunderstand than dataset_to_icechunk(ds, store, x) (I'm intentionally choosing a bad variable name to make my point) |
Thank you. I agree. I'll take that route unless someone strongly advises otherwise. |
@chuckwondo would you be interested in opening another PR that adds the VirtualiZarr/virtualizarr/accessor.py Lines 42 to 82 in e3787c5
This is a convenience method that allows users to use code like: vds.virtualize.to_icechunk(store, group=group)) rather than needing to call I think the best way to test this new code would be to modify the test you previously parameterized to use the VirtualiZarr/virtualizarr/tests/test_writers/test_icechunk.py Lines 42 to 80 in e3787c5
|
Sure! |
Oops, thank you for catching that @maxrjones! Yes that function isn't meant to be public API, only the accessors. |
@TomNicholas, do you mean that the |
Yes - the fact it's not public API is supposed to be clear from the fact that it isn't listed on the API docs page https://virtualizarr.readthedocs.io/en/stable/api.html But the accessor method is mentioned. But I welcome suggestions of how to make this clearer. |
I think changing it to |
Will do. I'll make this change in the context of the accessor update. @maxrjones, do you want me to create a new issue for the accessor work? |
@TomNicholas, I was just chatting with @maxrjones about the usefulness (or lack thereof) of a Dataset accessor in this case. I generally avoid "accessor registration" because of what I find to be degraded dev productivity due to the lack of "discoverability" of available accessors on an object (i.e., no code-completion help to discover registered accessors), and further lack of code-completion on a registered accessor (thus no ability to discover properties and methods of an accessor, forcing the need to dig up docs). Further, static type checking goes out the window since registered accessors are simply typed as Max pointed out that such accessors can be useful for adding state without the need to subclass, but since in this case, we're not adding state, do we want to "force" users to use an accessor instead of allowing them to use the |
Thanks for the thoughtful Qs @chuckwondo. As this is orthogonal to the addition of the |
Excellent. Thanks for starting a separate discussion. I'll add my 2 cents over there. In the meantime, I've plumbed things through as requested, in PR #391 |
This signature
VirtualiZarr/virtualizarr/writers/icechunk.py
Lines 25 to 27 in fcdd5e4
should include a
group
option to allow you to put the dataset somewhere other than at the root of the store hierarchy.The text was updated successfully, but these errors were encountered: