Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable select_resample_op for unitless data #2060

Closed
wants to merge 3 commits into from

Conversation

saschahofmann
Copy link
Contributor

@saschahofmann saschahofmann commented Jan 27, 2025

Pull Request Checklist:

  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGELOG.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

  • to_agg_units now returns the resampled data without units if the original data was unitless
    I could imagine that maybe you don't ever want unitless data in which this can be closed but if not then this is an easy fix.

Does this PR introduce a breaking change?

I don't think so.

Reproducible example

Before this change this example was breaking with a KeyError

from xclim.indices.generic import select_resample_op
import xarray as xr
import numpy as np

time = xr.cftime_range("2000-01-01", "2000-12-31", freq="D", calendar="noleap")
da = xr.DataArray(np.random.random_sample((time.size,)), coords={"time": time})

select_resample_op(da, op='max', freq='MS')

Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @saschahofmann! I agree that we should handle this case rather than error here.

Comment on lines +669 to +670
if "units" not in orig.attrs:
return out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aulemahal Would it make sense to emit a UserWarning if the data doesn't have units, so that users know that nothing will happen here?

Copy link
Collaborator

@aulemahal aulemahal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure I agree here!

To me, select_resample_op and to_agg_units are part of the "unit-aware" section of xclim, passing unitless data is an error. "unitless" doesn't mean "dimensionless", it means "insufficient information".

This condition ensures, select_resample_op can be used as a "generic compute function", directly wrapped in an Indicator.

What was your use case that required unitless-support ? Could da.assign_attrs(units="1") solve the issue ?

@saschahofmann
Copy link
Contributor Author

Yes, I can fix this issue by providing an empty unit! If this is expected I can close this no problem. I wasn't sure whether you want to support unitless or not.

Just FYI, this is happening because we still have a lot of indicators that arent computed via xclim (but I am planning to migrate them) and we were less smart about units.

Feel free to close 👍

@coxipi
Copy link
Contributor

coxipi commented Jan 30, 2025

@aulemahal sorry I'm late to the dance. My reasoning was the opposite, I thought these functions outside of indices agro, etc. could be more general. But I was thinking more in terms in the various indices that we have that already put a constraint on variables having units, so it would be all right to call this more general function because the decorators put a constraint on the input. But, in fact, even though select_resample_op is often called by indices functions, it can also itself be considered at the same level as functions in indices, correct? In that case I agree, it makes sense.

Although, a counterpoint I see to this: The indicator already does some sanity checks on the inputs, right? Why isn't the job of the indicator class to filter bad/good input, at least ask for inputs with units if that's the philosophy of the indicator class? Then, supplementary constraints can be put on lower level for specific applications (e.g. tas=[temperature] in indices functions, etc)

@aulemahal
Copy link
Collaborator

it can also itself be considered at the same level as functions in indices

This is my reasoning indeed! Even though it is used a lot within unit-aware functions, I'd prefer to keep it unit-aware itself, so simple indicators can be created from it.

If you all agree, I suggest we close this. Thanks @saschahofmann , sorry for thwart your plans!

@aulemahal aulemahal closed this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants