Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] GeoTransformIndex prototype #10055

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Feb 17, 2025

yet another attempt to build CRS-aware indexing on top of #9543.

Here is an example notebook using @mdsumner's example data from #9543

I'm opening this here for visibility, and because there's clearly much to do inside Xarray before this can be fully built out someplace else.


Some features:

  1. Built on top of the GeoTransform attribute, so we track both CRS and extents.
  2. It allows isel with slices and creates a new GeoTransform for the result. This means the index needs to track the spatial_ref or "grid mapping" variable. Not sure how this will interface with xproj
  3. It (hackily) allows .sel with crs specified and transforms to the data's CRS for indexing.

Major comments:

  • .sel broke after I added .isel! I don't fully understand this but we'll need to figure out some way to skip applying isel?
  • Slicing with newds.isel(x=slice(None, None, 2), y=slice(None, None, 2)) creates 3 new GeoTransformIndex objects. I don't understand this.
  • We will need to support custom index options in .sel to allow the user to pass a CRS for their query down to the index. The second comment does this is a hacky manner.

Other comments:

  • the first commit is a bugfix, that can be easily moved to its own PR

cc @maxrjones @benbovy

dcherian and others added 3 commits February 14, 2025 17:05
@mdsumner
Copy link

mdsumner commented Feb 17, 2025

I don't understand this, the (implicit) coords are in metres, but this idiomatically seems to imply a change of crs to longlat:

Screenshot_20250217-191426

there's no sense to setting options to longlat here afaics?? surely sel() is about the native coordinates?

Obviously we can change coordinate system for the data, but why? Change the query. What am I missing

@benbovy
Copy link
Member

benbovy commented Feb 17, 2025

Great to see you experimenting with coordinate transforms @dcherian!

A few comments (I've had a quick look at the code and notebook but didn't play with it yet):

.sel broke after I added .isel! I don't fully understand this but we'll need to figure out some way to skip applying isel?

Your GeoTransformIndex.sel() implementation relies on CoordinateTransformIndex.sel(), which only supports vectorized (fancy) indexing. GeoTransformIndex.isel() fails because its implementation assumes basic / orthogonal indexing, i.e., the assert "indexers are slices" statements fail. If you want GeoTransformIndex to support label-based orthogonal indexing, GeoTransformIndex.sel() has to somehow return an IndexSelResult object with slice indexers for dimensions "x" and/or "y".

Not sure exactly what you mean by "skip applying isel", but if you mean do not create a new GeoTransformIndex then GeoTransformIndex.isel() can simply return None so Xarray will fallback to Variable.isel() for indexing the "x" and "y" coordinates. This is the default behavior of CoordinateTransformIndex.isel() (point-wise indexing).

It allows isel with slices and creates a new GeoTransform for the result. This means the index needs to track the spatial_ref or "grid mapping" variable. Not sure how this will interface with xproj

The main idea of xproj is to keep the spatial reference coordinate (with its CRSIndex) separate from other crs-aware indexed coordinates (here "x" and "y"), so we can reuse it with different kinds for geo-referenced data (raster, vector, irregular mesh, point cloud, etc.).

For this reason I think it is better not to associate a spatial reference coordinate with GeoTransformIndex. What we need here essentially is to export the affine parameters as a GeoTransform attribute of the spatial ref coordinate prior to writing the dataset for using it with GDAL. We could do that via a utility function like rioxarray's write_transform (maybe just use the latter?). If that makes senses, we could also extend xproj's API and index interface for convenience.

We will need to support custom index options in .sel to allow the user to pass a CRS for their query down to the index.

I don't understand this, the (implicit) coords are in metres, but this idiomatically seems to imply a change of crs to longlat

I think here the crs passed as an option to sel corresponds to the CRS of the input values passed to newds.sel(xc=[...], yc=[...]), which are then converted to match the CRS of newds.xc and newds.yc prior to applying the selection.

See discussion in #7099 for passing arbitrary options to sel.

@benbovy
Copy link
Member

benbovy commented Feb 17, 2025

Slicing with newds.isel(x=slice(None, None, 2), y=slice(None, None, 2)) creates 3 new GeoTransformIndex objects. I don't understand this.

I'm not sure to see the 3 new GeoTransformIndex objects in the example notebook? What I see through the repr of the dataset is the spatial_ref coordinate that keeps a GeoTransformIndex while the xc and yc coordinates do not have an index anymore. The latter makes sense if GeoTransformIndex.isel returns None (i.e., the default behavior if it is not re-implemented). I suspect that the spatial_ref coordinate keeps the original GeoTransformIndex object because that scalar variable is not updated / indexed (we should probably ensure that the index is dropped in that case as well).

@benbovy
Copy link
Member

benbovy commented Feb 20, 2025

Slicing with newds.isel(x=slice(None, None, 2), y=slice(None, None, 2)) creates 3 new GeoTransformIndex objects. I don't understand this.

I could reproduce this. It is a regression introduced in #9002. Bypassing the fastpath isel indexes method brings back the expected behavior (a unique index for each of its associated coordinates).

EDIT: opened #10063

@dcherian
Copy link
Contributor Author

For this reason I think it is better not to associate a spatial reference coordinate with GeoTransformIndex. What we need here essentially is to export the affine parameters as a GeoTransform attribute of the spatial ref coordinate prior to writing the dataset for using it with GDAL.

The GeoTransform attribute is stored on spatial_ref which xproj relies on. I don't think it would be good to silently remove this attribute, track it internally, and then expect the user to remember to add it back at write-time. We can't leave it on the spatial_ref variable without updating it when necessary either (that's basically the encoding mess all over again).

So I don't think this is a good idea. Seems like xproj needs to handle every piece of metadata on spatial_ref or "decode" it to a complex in-memory object that will not be confused with spatial_ref.

@benbovy
Copy link
Member

benbovy commented Feb 24, 2025

I don't think either that we need or should remove attributes on the spatial reference coordinate. Associating it with an xproj.CRSIndex and somehow ensuring the geotransform metadata is kept in-sync doesn't seem incompatible to me (although probably not ideal admittedly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants