New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add `range` validation #473

Draft

dc-almeida wants to merge 1 commit into IAMconsortium:main from dc-almeida:feature/add-range-variable-attr-for-validation

Collaborator

dc-almeida commented Feb 7, 2025

#471
Adds a range attribute to set lower/upper bounds in validation criteria.
Adds a simple test showing it works equivalently to bounds.
Missing test(s) to raise validation error.


          Add range validation

3a3c650

dc-almeida added the enhancement label

dc-almeida assigned danielhuppmann and dc-almeida

phackstock reviewed

View reviewed changes

Contributor

phackstock left a comment •

edited

Loading

Thanks for adding this quickly @dc-almeida 👍
A few comments and suggested changes in line.
One more general comment is that I'm not sure we need DataValidationCriteriaRange as a new class. In terms of functionality it's entirely identical to using upper and lower bound, just in a different notation in the yaml file.
Thinking about it, I'm not sure we should even offer two ways to do the same thing. I'd suggest to only support either range or upper_bound and lower_bound. I'd suggest only supporting the more compact range.
On a related note, the classes IamcDataFilter and DataValidationCriteria can and should be combined in my opinion. IamcDataFilter is never used standalone and only DataValidationCriteria inherits from it. That's probably better addressed though when tackling #469.

nomenclature/processor/data_validator.py

               class DataValidationCriteriaMultiple(IamcDataFilter):
                   validation: (
-                      list[DataValidationCriteriaValue | DataValidationCriteriaBounds] | None
+                      list[
+                          DataValidationCriteriaValue

Contributor

phackstock Feb 10, 2025

I would change this to:

Suggested change

      
                        DataValidationCriteriaValue
          
                        DataValidationCriteria

nomenclature/processor/data_validator.py

-                      list[DataValidationCriteriaValue | DataValidationCriteriaBounds] | None
+                      list[
+                          DataValidationCriteriaValue
+                          | DataValidationCriteriaBounds

Contributor

phackstock Feb 10, 2025

Suggested change

| DataValidationCriteriaBounds

nomenclature/processor/data_validator.py

+                      list[
+                          DataValidationCriteriaValue
+                          | DataValidationCriteriaBounds
+                          | DataValidationCriteriaRange

Contributor

phackstock Feb 10, 2025

Suggested change

| DataValidationCriteriaRange

nomenclature/processor/data_validator.py

+                  range: list[float] = Field(..., min_length=2, max_length=2)
+                  @model_validator(mode="after")
+                  def check_range_is_valid(self):

Contributor

phackstock Feb 10, 2025

This validator represents a design choice in my opinion. We for sure want the range to be ordered and the way I see it, is we have two options to ensure this:

(implemented here) we check and throw and error if the range is not ordered
We take the input value for range, sort it and only then save it into the attribute

The same design question presents itself when talking about the ordering of the severity levels. Currently we through an error if they are not sorted by the most severe first. Alternatively, we could also take the input, sort it and then save it.

The question does not (and probably should not) be answered, I just wanted to raise the point to be discussed at some stage. My (slight) personal preference would be to take whatever the user provides us, sort it and then work with it. This way we could potentially reduce the amount of frustrated nomenclature users.

Member

danielhuppmann Feb 11, 2025

The design-choice on checking the order of severity levels was my suggestion - I think it's better to force users to follow an easy-to-read convention. And it also avoids the ordering as one extra step in the code, improving readability.

Contributor

phackstock Feb 12, 2025

The design-choice on checking the order of severity levels was my suggestion - I think it's better to force users to follow an easy-to-read convention. And it also avoids the ordering as one extra step in the code, improving readability.

Fair point in making that design choice and sticking to it here as well, works for me. You don't really save any code though since in one case you'd be sorting the inputs, in the other case you're checking that they arrive sorted, six and one half dozen.

nomenclature/processor/data_validator.py

+                  @model_validator(mode="after")
+                  def check_range_is_valid(self):
+                      if self.range[0] > self.range[1]:
+                          raise ValueError("Validation range is invalid: " + str(self.criteria))

Contributor

phackstock Feb 10, 2025

This error message does not contain enough information for the user to fix the issue:

Suggested change

      
                        raise ValueError("Validation range is invalid: " + str(self.criteria))
          
                        raise ValueError("Validation range must be ordered with the smaller value first: " + str(self.criteria))

nomenclature/processor/data_validator.py

+                          raise ValueError("Validation range is invalid: " + str(self.criteria))
+                      return self
+                  @computed_field

Contributor

phackstock Feb 10, 2025

Nice use of the computed_field decorator 👍 To make things more explicit and follow the pydantic docs (https://docs.pydantic.dev/2.10/concepts/fields/#the-computed_field-decorator), I'd adding the property decorator as well for this and lower_bound.

Member

danielhuppmann commented Feb 11, 2025

One more general comment is that I'm not sure we need DataValidationCriteriaRange as a new class. In terms of functionality it's entirely identical to using upper and lower bound, just in a different notation in the yaml file.
Thinking about it, I'm not sure we should even offer two ways to do the same thing.

Disagree, there are actually two distinct use cases: have both upper and lower bounds (i.e., a range), vs. having only one of those.

Contributor

phackstock commented Feb 11, 2025

Disagree, there are actually two distinct use cases: have both upper and lower bounds (i.e., a range), vs. having only one of those.

Fair point, still, having upper and lower bound entirely covers the use of range. If you want to provide users the opportunity to use both option then I think having two classes is the cleanest way to implement that.
I'd still prefer to give users one, and only one way (upper and lower bound in this case), but I'm fine merging the PR. We are after all in the dangerous realm of personal preference :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels