Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move UnwrapCastInComparison into Simplifier #15012

Merged
merged 13 commits into from
Mar 6, 2025

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Mar 5, 2025
@jayzhan211 jayzhan211 marked this pull request as draft March 5, 2025 01:53
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 5, 2025
@jayzhan211 jayzhan211 marked this pull request as ready for review March 5, 2025 03:59
@@ -243,7 +242,6 @@ impl Optimizer {
let rules: Vec<Arc<dyn OptimizerRule + Sync + Send>> = vec![
Arc::new(EliminateNestedUnion::new()),
Arc::new(SimplifyExpressions::new()),
Arc::new(UnwrapCastInComparison::new()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great -- it will also reduce the number of times the entire plan tree gets walked/massaged

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @jayzhan211 -- this looks awesome

I don't think we should delete the tests, but otherwise this PR could be merged in my opinion

// no additional rewrites possible
expr => Transformed::no(expr),
})
}
}

fn unwrap_cast_in_comparison_for_binary<S: SimplifyInfo>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the expr_simplifer.rs module is getting big, we could consider putting the code that supports the casting into a different module (for example unwrap_cast.rs for example)

use datafusion_expr::utils::merge_schema;
use datafusion_expr::{lit, Expr, ExprSchemable, LogicalPlan};

/// [`UnwrapCastInComparison`] attempts to remove casts from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to keep these comments? I think the context / content is important

Maybe the could be put as module level comments in unwrap_cast.rs if you move that code into the simplifier

use datafusion_expr::{cast, col, in_list, try_cast};

#[test]
fn test_not_unwrap_cast_comparison() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests probably should be ported too. Would you like some help with that ? I bet we could find people to do so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can file issue to increase more tests. Btw, I didn't add any new tests because I think the existing test is enough, it catches my bug during my implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that these tests likely cover a bunch of corner cases like decimal precision and different timestamp types that may not be fully covered in .slt

I'll try and help either port them or file a ticket to do so

Copy link
Contributor

@berkaysynnada berkaysynnada Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @alamb. We need to somehow preserve these tests within this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR seems to improve performance non trivially, I will work on porting the tests now. I'll see how far I get

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a proposed change to this PR to port the tests (turns out to be much easier than I thought):

@alamb
Copy link
Contributor

alamb commented Mar 5, 2025

Screenshot 2025-03-05 at 9 16 30 AM

BTW this PR seems to significantly IMPROVE planning performance. Perhaps as it has reduced a plan copy 🤔

I will rerun the benchmarks to make sure, but if I can reproduce the results I am that much more incentivized to help port the tests / get this in.

Thanks you @jayzhan211 🚀

@alamb
Copy link
Contributor

alamb commented Mar 5, 2025

Yeah I consistently see signfiicantly faster planing with this PR 🚀

group                                         main                                   mv-unwrap-cast-to-sim-expr
-----                                         ----                                   --------------------------
...
physical_plan_tpcds_all                       1.16  1469.6±27.59ms        ? ?/sec    1.00   1270.0±9.86ms        ? ?/sec
physical_plan_tpch_all                        1.15    100.8±2.54ms        ? ?/sec    1.00     87.3±1.65ms        ? ?/sec
physical_plan_tpch_q1                         1.07      3.4±0.06ms        ? ?/sec    1.00      3.2±0.05ms        ? ?/sec
physical_plan_tpch_q10                        1.07      4.6±0.04ms        ? ?/sec    1.00      4.3±0.10ms        ? ?/sec
physical_plan_tpch_q11                        1.08      4.2±0.05ms        ? ?/sec    1.00      3.9±0.04ms        ? ?/sec
physical_plan_tpch_q12                        1.07      3.2±0.03ms        ? ?/sec    1.00      3.0±0.03ms        ? ?/sec
physical_plan_tpch_q13                        1.05      2.6±0.03ms        ? ?/sec    1.00      2.4±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.07      3.0±0.03ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
physical_plan_tpch_q16                        1.07      4.0±0.06ms        ? ?/sec    1.00      3.7±0.06ms        ? ?/sec
physical_plan_tpch_q17                        1.09      3.9±0.04ms        ? ?/sec    1.00      3.6±0.07ms        ? ?/sec
physical_plan_tpch_q18                        1.09      4.3±0.05ms        ? ?/sec    1.00      3.9±0.08ms        ? ?/sec
physical_plan_tpch_q19                        1.09      6.1±0.05ms        ? ?/sec    1.00      5.6±0.08ms        ? ?/sec
physical_plan_tpch_q2                         1.13      8.0±0.23ms        ? ?/sec    1.00      7.0±0.07ms        ? ?/sec
physical_plan_tpch_q20                        1.07      4.9±0.03ms        ? ?/sec    1.00      4.6±0.15ms        ? ?/sec
physical_plan_tpch_q21                        1.13      6.3±0.14ms        ? ?/sec    1.00      5.6±0.06ms        ? ?/sec
physical_plan_tpch_q22                        1.09      3.9±0.16ms        ? ?/sec    1.00      3.5±0.03ms        ? ?/sec
physical_plan_tpch_q3                         1.09      3.4±0.06ms        ? ?/sec    1.00      3.2±0.03ms        ? ?/sec
physical_plan_tpch_q4                         1.07      2.7±0.02ms        ? ?/sec    1.00      2.5±0.02ms        ? ?/sec
physical_plan_tpch_q5                         1.12      4.7±0.09ms        ? ?/sec    1.00      4.2±0.04ms        ? ?/sec
physical_plan_tpch_q6                         1.06  1933.1±21.80µs        ? ?/sec    1.00  1823.7±19.00µs        ? ?/sec
physical_plan_tpch_q7                         1.13      6.2±0.11ms        ? ?/sec    1.00      5.4±0.05ms        ? ?/sec
physical_plan_tpch_q8                         1.13      7.2±0.21ms        ? ?/sec    1.00      6.4±0.08ms        ? ?/sec
physical_plan_tpch_q9                         1.10      5.6±0.08ms        ? ?/sec    1.00      5.1±0.08ms        ? ?/sec
physical_select_aggregates_from_200           1.01     31.3±0.23ms        ? ?/sec    1.00     30.9±0.32ms        ? ?/sec
physical_select_all_from_1000                 1.04     43.3±0.18ms        ? ?/sec    1.00     41.7±0.21ms        ? ?/sec
physical_select_one_from_700                  1.53      3.4±0.05ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
physical_sorted_union_orderby                 1.09     78.1±0.81ms        ? ?/sec    1.00     71.9±0.67ms        ? ?/sec
physical_theta_join_consider_sort             1.33      3.8±0.13ms        ? ?/sec    1.00      2.8±0.05ms        ? ?/sec
physical_unnest_to_join                       1.40      3.4±0.08ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
with_param_values_many_columns                1.01    161.8±0.98µs        ? ?/sec    1.00    159.9±1.13µs        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also working on a test that shows that this PR fixes the issue in the ticket

use datafusion_expr::{cast, col, in_list, try_cast};

#[test]
fn test_not_unwrap_cast_comparison() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that these tests likely cover a bunch of corner cases like decimal precision and different timestamp types that may not be fully covered in .slt

I'll try and help either port them or file a ticket to do so

Port `unwrap_cast` tests to `ExprSimplifier`
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better now, thank you @jayzhan211 and @alamb

@alamb
Copy link
Contributor

alamb commented Mar 6, 2025

I also filed a ticket to explore making planning fster by more consolidation

@alamb alamb merged commit eb4ee62 into apache:main Mar 6, 2025
24 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 6, 2025

Thank you @jayzhan211 and @berkaysynnada

danila-b pushed a commit to danila-b/datafusion that referenced this pull request Mar 8, 2025
* add unwrap in simplify expr

* rm unwrap cast

* return err

* rename

* fix

* fmt

* add unwrap_cast module to simplify expressions

* tweak comment

* Move tests

* Rewrite to use simplifier schema

* Update tests for simplify logic

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SessionContext::create_physical_expr does not unwrap casts (and thus is not always optimal)
3 participants