Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise validation error when unhashable items added to a set #1619

Merged
merged 9 commits into from
Feb 6, 2025

Conversation

mikeedjones
Copy link
Contributor

@mikeedjones mikeedjones commented Feb 3, 2025

Change Summary

Adds the validate_hashable function to check that items being added to a set by validate_iter_to_set have an implemented __hash__ function. Adds the associated SetItemNotHashable error and error message.

I'm not sure this is the best place to check whether or not items being added to a set are hashable, and I'm not sure binding the object is strictly necessary to check its hashability. Maybe this validation should happen in SetValidator instead and only capture the case where the user is specifically trying to add unhashable objects to a set?

Related issue number

pydantic/pydantic#11360 in the pydantic repo - I can open a mirror issue in pydantic-core?

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

Copy link

codspeed-hq bot commented Feb 3, 2025

CodSpeed Performance Report

Merging #1619 will not alter performance

Comparing mikeedjones:mj/unhashable-items-in-set (3d924d5) with main (164b9ff)

Summary

✅ 157 untouched benchmarks

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

Attention: Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.60%. Comparing base (ab503cb) to head (3d924d5).
Report is 282 commits behind head on main.

Files with missing lines Patch % Lines
src/input/return_enums.rs 88.23% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1619      +/-   ##
==========================================
- Coverage   90.21%   89.60%   -0.61%     
==========================================
  Files         106      115       +9     
  Lines       16339    17929    +1590     
  Branches       36       25      -11     
==========================================
+ Hits        14740    16066    +1326     
- Misses       1592     1844     +252     
- Partials        7       19      +12     
Files with missing lines Coverage Δ
python/pydantic_core/core_schema.py 95.08% <ø> (+0.31%) ⬆️
src/errors/types.rs 99.72% <100.00%> (+0.27%) ⬆️
src/input/return_enums.rs 87.32% <88.23%> (+1.61%) ⬆️

... and 63 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 164b9ff...3d924d5. Read the comment docs.

@mikeedjones
Copy link
Contributor Author

please review :)

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall this looks in the right direction, I'd like to suggest an efficiency gain before merging.

Comment on lines 235 to 237
match validate_hashable(py, item.borrow_input(), state, validator) {
Ok(item) => {
set.build_add(item)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the change like this will have the cost of running hash twice for every element added to the set. I suggest instead the code is adjusted to only try calling item.hash() if set.build_add(item) fails, that way we're only adding overhead on the unhappy path.

Copy link
Contributor Author

@mikeedjones mikeedjones Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sinceset.build_set takes ownership of item, instead of cloning item to calculate the hash if set.build_set fails, I thought it would be better to instead inspect the PyErr and return the SetItemNotHashable if it's a TypeError. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with that 👍

@mikeedjones mikeedjones force-pushed the mj/unhashable-items-in-set branch from ee71da1 to 4eecd19 Compare February 5, 2025 13:58
@davidhewitt
Copy link
Contributor

@pydantic/oss - any opinion on the new error type introduced here? Seems ok to me, though we'll need to handle downstream in pydantic.

@Viicos
Copy link
Member

Viicos commented Feb 5, 2025

Thanks @mikeedjones for looking into this. Can this be made more generic in some way? Sets are not the only type requiring hashable members. Dict keys comes to mind, although I don't think there's currently a way to reproduce the same error, as we only allow mappings to validate against dicts (meaning if you already have a valid mapping, keys are guaranteed to be hashable).

@mikeedjones
Copy link
Contributor Author

mikeedjones commented Feb 5, 2025

I agree that would be better @Viicos - but as you point out in most other instances the hashability of the relevant item is checked much earlier in the validation.

A generic version might be a validator for python's Hashable? (unless there is one which i've missed sorry) - But adding that validator by default to validation of a dict's keys or a set's members would duplicate the checks done in pyo3 and other libraries, as @davidhewitt pointed out.

Comment on lines 111 to 112
def __hash__(self):
raise TypeError('unhashable type')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the canonical way to create an unhashable type is to set __hash__ = None

@Viicos
Copy link
Member

Viicos commented Feb 5, 2025

Yeah I think it's fine to continue with this approach scoped to sets, and we can extend/change this in the future.

@Viicos
Copy link
Member

Viicos commented Feb 5, 2025

@pydantic/oss - any opinion on the new error type introduced here? Seems ok to me, though we'll need to handle downstream in pydantic.

LGTM!

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mikeedjones for adding the downstream PR too 👍

@davidhewitt davidhewitt merged commit f7fb50b into pydantic:main Feb 6, 2025
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants