-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor test suite in EnforceDistribution, to use standard test config. #15010
Conversation
// By default, will not repartition / resort data if it is already sorted. | ||
config.optimizer.prefer_existing_sort = false; | ||
|
||
// By default, will attempt to convert Union to Interleave. | ||
config.optimizer.prefer_existing_union = false; | ||
|
||
// By default, will not repartition file scans. | ||
config.optimizer.repartition_file_scans = false; | ||
config.optimizer.repartition_file_min_size = 1024; | ||
|
||
// By default, set query execution concurrency to 10. | ||
config.execution.target_partitions = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These match the defaults on main:
assert_optimized!($EXPECTED_LINES, $PLAN, $FIRST_ENFORCE_DIST, false, 10, false, 1024, false); |
// Use a small batch size, to trigger RoundRobin in tests | ||
config.execution.batch_size = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches main:
datafusion/datafusion/core/tests/physical_optimizer/enforce_distribution.rs
Lines 412 to 413 in c61e7e5
// Use a small batch size, to trigger RoundRobin in tests | |
config.execution.batch_size = 1; |
…est config builder
9af23e0
to
82b0342
Compare
/// * `FIRST_ENFORCE_DIST` - | ||
/// true: (EnforceDistribution, EnforceDistribution, EnforceSorting) | ||
/// false: else runs (EnforceSorting, EnforceDistribution, EnforceDistribution) | ||
/// * `PREFER_EXISTING_SORT` (optional) - if true, will not repartition / resort data if it is already sorted | ||
/// * `TARGET_PARTITIONS` (optional) - number of partitions to repartition to | ||
/// * `REPARTITION_FILE_SCANS` (optional) - if true, will repartition file scans | ||
/// * `REPARTITION_FILE_MIN_SIZE` (optional) - minimum file size to repartition | ||
/// * `PREFER_EXISTING_UNION` (optional) - if true, will not attempt to convert Union to Interleave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are now replaced by the required DoFirst
(for one-of 2 possible optimizer run patterns),
then all the optional settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like how it is much more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @wiedld -- I think this is a major step forward in test readability as all the relevant options are now explicit in the tests, rather than being true
/ false
parameters where interpreting them requires looking at a macro definition.
cc @berkaysynnada and @mustafasrepo / @akurmustafa
/// * `FIRST_ENFORCE_DIST` - | ||
/// true: (EnforceDistribution, EnforceDistribution, EnforceSorting) | ||
/// false: else runs (EnforceSorting, EnforceDistribution, EnforceDistribution) | ||
/// * `PREFER_EXISTING_SORT` (optional) - if true, will not repartition / resort data if it is already sorted | ||
/// * `TARGET_PARTITIONS` (optional) - number of partitions to repartition to | ||
/// * `REPARTITION_FILE_SCANS` (optional) - if true, will repartition file scans | ||
/// * `REPARTITION_FILE_MIN_SIZE` (optional) - minimum file size to repartition | ||
/// * `PREFER_EXISTING_UNION` (optional) - if true, will not attempt to convert Union to Interleave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much like how it is much more explicit
assert_optimized!( | ||
expected, | ||
top_join.clone(), | ||
&TestConfig::new(DoFirst::Distribution) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking this a step more, what do you think of using a function on TestConfig
so this looks like:
assert_optimized!( | |
expected, | |
top_join.clone(), | |
&TestConfig::new(DoFirst::Distribution) | |
); | |
TestConfig::new(DoFirst::Distribution) | |
.assert_optimized) | |
&expected, | |
&top_join) | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. I was already planning for the next followup PR to change the macro to a functional call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the draft of the followup PR: influxdata#61
Just to get an idea of where this is heading.
I'll review this tomorrow as well |
BTW @blaginin is proposing to add If we merge that some of these tests may become much eaiser to maintain / change 🤔 |
Let's plan to merge this PR tomorrow / after @berkaysynnada has had a chance to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea. Interpreting these tests becomes much clearer. Thank you @wiedld. I just have two questions, but perhaps I'm missing something as I'm a bit tired today 😅
true, | ||
repartition_size, | ||
false | ||
&test_config.clone().with_query_execution_partitions(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing prefer_existing_sort and repartition_file_scans here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the common config a few lines up here: https://github.com/apache/datafusion/pull/15010/files/5de55b3a8bf1858206dcecc0d3b1907228c970a4#r1984443646. Then only modified the unique config (with_query_execution_partitions
) per test case.
true, | ||
repartition_size, | ||
false | ||
&test_config.with_query_execution_partitions(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here?
let test_config = TestConfig::new(DoFirst::Distribution) | ||
.with_prefer_existing_sort() | ||
.with_prefer_repartition_file_scans(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the common config.
That's my everyday. 😆 |
Thank you @berkaysynnada and @wiedld |
…est config builder (apache#15010)
Which issue does this PR close?
Part of #15003
Rationale for this change
Make it easier to determine the differences in test configuration, for each test case.
Also clearly defines what are the defaults used for the test suite.
What changes are included in this PR?
Define
TestConfig
, which requires one-ofDoFirst
optimizer run configs. Most test cases uses this test config with the default settings.Provide a set of interfaces to define which configurations get tweaked.
Are these changes tested?
Yes
Are there any user-facing changes?
No