-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interoperability with xarray/dask #479
Comments
Thanks for opening this @bekozi. Is a field a single variable (e.g. temperature)? Or is it a collection of variables? For a single variable, we'd be looking to map the ocgis structure to an xarray.DataArray(data, coords=None, dims=None, name=None, attrs=None)
The |
It's a bit of both. It was easier to have fields also be collections so that is what they are - essentially collections associated with coordinate variables via a dimension map. They are also dict-like containers similar to the
Cool, thanks. It seems pretty straightforward, but I do have some questions (turned into more than I expected...):
Sorry for the barrage here. Take your time. 😁 |
There hasn't been much (any?) direct integration with geopandas. We can store arrays of objects in xarray but there won't be the geo-hooks that geopandas has. I think that is okay for a first cut.
If these are scalars (or tuples), they can either be stored as coordinates or as attributes. If they are attributes, they won't be used by xarray's operations but they will stay with the data.
Yes, usually. If they have different dimensions/shapes, they need a bit of special treatment but for now, let's treat them as coords.
If you pass in a masked array, xarray will convert it to a standard numpy array and use nan's as fill values. This is mostly for performance. I'm not 100% sure but I don't think we store the fill value. We could probably come up with something though.
No. There are conversations in the works for supporting sparse arrays but this is not a feature yet.
Do you mean groups in the hierarchical sense (like HDF)? This is not supported by xarray.
I think this should work (as long as the array shape has the same number of dimensions as the specified dimensions). Undimensioned variables okay in xarray? Yes. Xarray will add dummy dimension names (e.g. |
Thanks for the quick response @jhamm. All sounds pretty good!
Got it. I too think that'll work okay at first.
Not that big of deal, more curious how masking is treated. It only matters in very specific cases.
I think the object arrays will suffice now that I think about it.
I do. And oops, I thought I saw this as a feature. This will be a problem with spatial collections where a subset geometry stores it's associated subsetted field. It should be possible to get around this by "melting" but will take a bit of leg work. Not as big a deal with unioning and spatial averaging. |
@jhamman I've made some progress on this: https://github.com/NCPP/ocgis/tree/i479-xarray. I need to add |
Cool! Glad to see this is moving forward.
Yes. I think this issue is the closest open one to what you're running into: pydata/xarray#1194 I suspect we'll get there eventually but not particularly soon. |
Thanks! Next time I'll do the legwork on the issue search... |
@jhamman I hit an issue in Anyway, I think the error is related to time bounds (it doesn't happen when I strip the time bounds variable from the dataset prior to decoding). Error traceback:
This is the dataset to decode:
This time variable decodes okay:
Here is the bounds variable causing an issue:
|
I'm not able to reproduce this working off xarray@master. What happens if you try to decode the xr.decode_cf(ds['time_bounds']) or if you try to set the ds = ds.set_coords('time_bounds')
xr.decode_cf(ds) For some reason, the attrs variable in |
..sorry for the slow response...caught a human bug... Thanks for trying to reproduce the issue. Looking at this again, this is something I know we talked briefly about bounds variables and coordinates in
The coordinate system variable is also just transferred over with no special attributes. It would be possible to insert a PROJ or WKT description into its attributes. The I am thinking about cleaning this up a bit and pushing. Can you think of anything else that should be added in this first cut? |
I think this is a good start. I'm quite busy over the next few weeks and don't know if I'll have any real time to put it into action. I'm fine with your plan of providing a first cut as an experimental feature. |
Added first-cut "to_xarray" methods on dimensions, variables, collections, and fields. Metadata handling is done by xarray using its "decode_cf" capability. Limitations are listed in the documentation for the "to_xarray" method on fields.
Initial capability is in |
Reopening to capture internal changes related to Major remaining issues are related to coordinate systems and spatial masking. Branch: https://github.com/NCPP/ocgis/tree/i479-xarray-interop |
@huard, a good example to work from for xclim is this masking test for the xarray driver: You can also use operations to subset the xarray field. I expect a number of operations will fail as this is not exhaustively tested. There are numerous specialization points in the code that should be on the driver eventually. It's just a matter of collecting all the edge cases... Time bounds calculations are performed when a grouping is computed to create a new time axis. The Let me know if you have any questions! |
Add
to_xarray(...)
onocgis
fields and potentially ato_ocgis(...)
somewhere on the other end.Ping @jhamman
The text was updated successfully, but these errors were encountered: