-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should PruningPredicate
coerce?
#14944
Comments
I think the conclusion is that PruningPredicaate expects a properly coerced expression already You can use datafusion/datafusion-examples/examples/expr_api.rs Lines 540 to 543 in 5e49094
There are a bunch of other examples there about how to apply coercion |
@alamb this is actually already done like that, but it still doesn't prune properly:
|
🤔 What are the coerced expressions? If it is like this:
That is not going to prune because the cast is happening on The predicate needs to be
This is done in this analyzer pass: https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/unwrap_cast_in_comparison.rs Maybe we should do the same thing in |
It's like cast(month_id, 'utf8') = '202502', see below:
|
So it seems like it would be a valuable thing to apply the type coercion rewriter in the expr simplifier then. datafusion/datafusion/optimizer/src/unwrap_cast_in_comparison.rs Lines 115 to 125 in 57a1221
I'll file a ticket. |
@alamb thanks! 🙏 |
I found that a predicate like this: month_id = '202502' Is actually coerced so that it does string comparison rather than integer comparison (note the predicate in the plan below is
|
So in other words, I think #15012 is necessary, but not sufficient, to fix this issue Update: PR to add some tests: |
Also filed a ticket for unwrapping that particular comparison expression |
Describe the bug
Currently when you pass a pruning predicate where the predicate has a different type as the targeted column it will not prune it, even though in theory the value is castable to the target column type.
The predicate looks like this: "month_id = '202502' AND date_id = '20250226'"
However the columns are
int
column notutf8
. So in theory these string values can be casted to int but I don't believe this is happening. Is this something that should be addedThis is our rust code btw
To Reproduce
Related delta-rs issue: delta-io/delta-rs#3278 (comment)
Expected behavior
Allow literal value coercion during pruning
Additional context
No response
The text was updated successfully, but these errors were encountered: