New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Refactor EnforceDistribution test cases to demonstrate dependencies across optimizer runs. #61

Closed

wiedld wants to merge 2 commits into main from 15003/remove-macro

+769 −390

Collaborator

wiedld commented Mar 5, 2025 •

edited

Loading

Which issue does this PR close?

Part of apache#15003

Is the next PR, after this previous PR merges: apache#15010

Rationale for this change

Enable us to write test cases using:

the same TestConfig
then apply the optimizer runs in any order chosen

The benefits of this approach are:

explicitly demonstrate when there is an ordering dependency in which optimizer has to run first (sorting or distribution)
quickly enable us to write test cases addressing idempotency of an optimizer run
(hopefully) helps identify strengths/gaps in the current test coverage. 🙏🏼

What changes are included in this PR?

The above, as we as a bit of an increase in test coverage to demonstrate exactly when our test cases (the plan results we see) are ordering dependent.

Are these changes tested?

Yes

Are there any user-facing changes?

No

github-actions bot added the core label

wiedld changed the title ~~15003/remove macro~~ Refactor EnforceDistribution test cases to demonstrate dependencies across optimizer runs.

wiedld force-pushed the 15003/remove-macro branch from bf5a9a1 to 498983f Compare

March 5, 2025 22:29

wiedld commented

View reviewed changes

datafusion/core/tests/physical_optimizer/enforce_distribution.rs

Comment on lines +449 to +458

    
                  /// Perform a series of runs using the current [`TestConfig`],

                  /// assert the expected plan result,

                  /// and return the result plan (for potentional subsequent runs).

                  fn run(

                      &self,

                      expected_lines: &[&str],

                      plan: Arc<dyn ExecutionPlan>,

                      optimizers_to_run: Vec<Run>,

                  ) -> Result<Arc<dyn ExecutionPlan>> {

                      let expected_lines: Vec<&str> = expected_lines.to_vec();

Collaborator Author

wiedld Mar 5, 2025

The reason this returns the plan, is so that we can easily write test cases for idempotency.

let plan_after_first_run = test_config.run(
    expected,
    plan,
    vec![Run::Distribution],
)?;
let plan_after_second_run = test_config.run(
    expected,
    plan_after_first_run.clone(),
    vec![Run::Distribution], // exact same run again
)?;
assert_eq!(
    get_plan_string(&plan_after_first_run),
    get_plan_string(&plan_after_second_run),
);

wiedld commented

View reviewed changes

datafusion/core/tests/physical_optimizer/enforce_distribution.rs

Comment on lines +404 to +405

		const DISTRIB_DISTRIB_SORT: [Run; 3] =
		[Run::Distribution, Run::Distribution, Run::Sorting];

Collaborator Author

wiedld Mar 5, 2025

Equivalent to main: https://github.com/apache/datafusion/blob/9a4c9d598ca93ef66ba02067cc1497d53c28cddc/datafusion/core/tests/physical_optimizer/enforce_distribution.rs#L459-L469

wiedld commented

View reviewed changes

datafusion/core/tests/physical_optimizer/enforce_distribution.rs

Comment on lines +406 to +407

		const SORT_DISTRIB_DISTRIB: [Run; 3] =
		[Run::Sorting, Run::Distribution, Run::Distribution];

Collaborator Author

wiedld Mar 5, 2025

Equivalent to main: https://github.com/apache/datafusion/blob/9a4c9d598ca93ef66ba02067cc1497d53c28cddc/datafusion/core/tests/physical_optimizer/enforce_distribution.rs#L471-L481

wiedld commented

View reviewed changes

datafusion/core/tests/physical_optimizer/enforce_distribution.rs

Comment on lines +1565 to +1570

    
                      // TODO(wiedld): show different test result if enforce distribution first.

                      test_config.run(

                          &expected_first_sort_enforcement,

                          top_join,

                          &TestConfig::new(DoFirst::Sorting).with_prefer_existing_sort()

                      );

                          SORT_DISTRIB_DISTRIB.into(),

                      )?;

Collaborator Author

wiedld Mar 5, 2025

I left a few TODO(wiedld) in the test multi_smj_joins.

In some cases the test case is showing the outcome for the sort occurring first BEFORE the distribution run, which does not match the ordering of the optimizer in the default planner -- and I thought that should be highlighted in the test cases.

wiedld commented

View reviewed changes

datafusion/core/tests/physical_optimizer/enforce_distribution.rs


		// Test: result IS DIFFERENT, if EnforceSorting is run first:

Collaborator Author

wiedld Mar 5, 2025

Highlighting the dependency of run ordering in the existing test cases.

This was referenced Mar 5, 2025

Refactor test suite in EnforceDistribution, to use standard test config. apache/datafusion#15010

Merged

Improve EnforceDistribution testings. apache/datafusion#15003

Open

wiedld added 2 commits

March 7, 2025 13:04


          refactor(15003): permit any combination of runs desired

e4a2dc1


          refactor(15003): convert macro to a function call on the TestConfig, …

…and highlight when the same testing setup, but different ordering of optimizer runs, effect the outcome.

wiedld force-pushed the 15003/remove-macro branch from 498983f to 9411528 Compare

March 7, 2025 21:09

github-actions bot added documentation logical-expr optimizer sqllogictest physical-expr common functions development-process substrait datasource labels

wiedld changed the base branch from 15003/switch-to-builder to main

March 7, 2025 21:09

Collaborator Author

wiedld commented Mar 7, 2025

Closing this draft, since we opened a new one on the actual apache project.

wiedld closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common core datasource development-process documentation functions logical-expr optimizer physical-expr sqllogictest substrait