Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor EnforceDistribution test cases to demonstrate dependencies across optimizer runs. #61

Closed
wants to merge 2 commits into from

Conversation

wiedld
Copy link
Collaborator

@wiedld wiedld commented Mar 5, 2025

Which issue does this PR close?

Part of apache#15003

Is the next PR, after this previous PR merges: apache#15010

Rationale for this change

Enable us to write test cases using:

  • the same TestConfig
  • then apply the optimizer runs in any order chosen

The benefits of this approach are:

  • explicitly demonstrate when there is an ordering dependency in which optimizer has to run first (sorting or distribution)
  • quickly enable us to write test cases addressing idempotency of an optimizer run
  • (hopefully) helps identify strengths/gaps in the current test coverage. 🙏🏼

What changes are included in this PR?

The above, as we as a bit of an increase in test coverage to demonstrate exactly when our test cases (the plan results we see) are ordering dependent.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the core label Mar 5, 2025
@wiedld wiedld changed the title 15003/remove macro Refactor EnforceDistribution test cases to demonstrate dependencies across optimizer runs. Mar 5, 2025
@wiedld wiedld force-pushed the 15003/remove-macro branch from bf5a9a1 to 498983f Compare March 5, 2025 22:29
Comment on lines +449 to +458
/// Perform a series of runs using the current [`TestConfig`],
/// assert the expected plan result,
/// and return the result plan (for potentional subsequent runs).
fn run(
&self,
expected_lines: &[&str],
plan: Arc<dyn ExecutionPlan>,
optimizers_to_run: Vec<Run>,
) -> Result<Arc<dyn ExecutionPlan>> {
let expected_lines: Vec<&str> = expected_lines.to_vec();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this returns the plan, is so that we can easily write test cases for idempotency.

let plan_after_first_run = test_config.run(
    expected,
    plan,
    vec![Run::Distribution],
)?;
let plan_after_second_run = test_config.run(
    expected,
    plan_after_first_run.clone(),
    vec![Run::Distribution], // exact same run again
)?;
assert_eq!(
    get_plan_string(&plan_after_first_run),
    get_plan_string(&plan_after_second_run),
);

Comment on lines +404 to +405
const DISTRIB_DISTRIB_SORT: [Run; 3] =
[Run::Distribution, Run::Distribution, Run::Sorting];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +406 to +407
const SORT_DISTRIB_DISTRIB: [Run; 3] =
[Run::Sorting, Run::Distribution, Run::Distribution];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1565 to +1570
// TODO(wiedld): show different test result if enforce distribution first.
test_config.run(
&expected_first_sort_enforcement,
top_join,
&TestConfig::new(DoFirst::Sorting).with_prefer_existing_sort()
);
SORT_DISTRIB_DISTRIB.into(),
)?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few TODO(wiedld) in the test multi_smj_joins.

In some cases the test case is showing the outcome for the sort occurring first BEFORE the distribution run, which does not match the ordering of the optimizer in the default planner -- and I thought that should be highlighted in the test cases.


// Test: result IS DIFFERENT, if EnforceSorting is run first:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting the dependency of run ordering in the existing test cases.

wiedld added 2 commits March 7, 2025 13:04
…and highlight when the same testing setup, but different ordering of optimizer runs, effect the outcome.
@wiedld
Copy link
Collaborator Author

wiedld commented Mar 7, 2025

Closing this draft, since we opened a new one on the actual apache project.

@wiedld wiedld closed this Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant