Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for coordinate inputs in polyfit. #9369

Merged
merged 15 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ Bug fixes
date "0001-01-01". (:issue:`9108`, :pull:`9116`) By `Spencer Clark
<https://github.com/spencerkclark>`_ and `Deepak Cherian
<https://github.com/dcherian>`_.
- Fix issue where polyfit wouldn't handle non-dimension coordinates. (:issue:`4375`, :pull:`9369`)
By `Karl Krauth <https://github.com/Karl-Krauth>`_.

Documentation
~~~~~~~~~~~~~
Expand Down
5 changes: 4 additions & 1 deletion xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -9023,7 +9023,10 @@ def polyfit(
variables = {}
skipna_da = skipna

x = get_clean_interp_index(self, dim, strict=False)
x = get_clean_interp_index(self, dim, use_coordinate=dim, strict=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is a bit strange. I think the get_clean_interp_index evolved in a way that makes it not match its signature and doc.

Argument "dim" of get_clean_interp_index is documented as "Name of the dimension", yet here the name of the coordinate is passed (which might not be a dimension, that's the point). However, the dim argument has no impact whatsoever on the the result of the function if use_coordinate is a string.

Also, the docstring of use_coordinate does not mention what happens when a string is passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it is strange, in an ideal world I think the dim argument could be either a coordinate or a dimension, and use_coordinate just wouldn't be necessary. But I didn't want to change the get_clean_interp_index signature too much since it seemed to match the public interpolate_na API quite closely, which I'm not familiar with.

I also didn't set the dim argument to None because there seemed to be some downstream use of that variable in the case where the resulting index is a DatetimeIndex.

Maybe as a half measure I could update the documentation of get_clean_interp_index in this PR to match more closely what the code is actually doing?

Alternatively, I could just inline the behaviour of get_clean_interp_index that we're actually using since that function seemed to be designed for a different purpose anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could just inline the behaviour of get_clean_interp_index that we're actually using since that function seemed to be designed for a different purpose anyway.

This would be great if it's a simple add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :) although mypy complained at me initially for reassigning x to different types. I'm not sure how precise this project wants all its types to be so for now I've just declared x as Any for the sake of brevity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I am currently working on improvements of interpolate_na and discovered similar issues with get_clean_interp_index. I tried to come up with an improved (and type safe) version of it (see here). However, good to know that polyfit is now independent, in this case get_clean_interp_index is only used within missing.py. I will try to write a better docstring for the function before merging.

# If we have a coordinate convert it to its underlying dimension.
dim = self.coords[dim].dims[0]

xname = f"{self[dim].name}_"
order = int(deg) + 1
lhs = np.vander(x, order)
Expand Down
2 changes: 1 addition & 1 deletion xarray/core/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ def _apply_over_vars_with_dim(func, self, dim=None, **kwargs):


def get_clean_interp_index(
arr, dim: Hashable, use_coordinate: str | bool = True, strict: bool = True
arr, dim: Hashable, use_coordinate: Hashable | bool = True, strict: bool = True
):
"""Return index to use for x values in interpolation or curve fitting.

Expand Down
10 changes: 10 additions & 0 deletions xarray/tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -6689,6 +6689,16 @@ def test_polyfit_weighted(self) -> None:
ds.polyfit("dim2", 2, w=np.arange(ds.sizes["dim2"]))
xr.testing.assert_identical(ds, ds_copy)

def test_polyfit_coord(self) -> None:
# Make sure polyfit works when given a non-dimension coordinate.
ds = create_test_data(seed=1)

out = ds.polyfit("numbers", 2, full=False)
assert "var3_polyfit_coefficients" in out
assert "dim1" in out
assert "dim2" not in out
assert "dim3" not in out

def test_polyfit_warnings(self) -> None:
ds = create_test_data(seed=1)

Expand Down
Loading