-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Fix multi-lines printing issue for datafusion-cli #14954
base: main
Are you sure you want to change the base?
Conversation
…and print, make it totally streaming printing without memory overhead (apache#14948)" This reverts commit 382e232.
Sorry for the inconvenience! We can review and merge this PR after the release: cc @alamb @xudong963
Meanwhile, i think more forks can also test it, i am not sure if we will have more bugs come since it's a totally new way for datafusion-cli, and i have added more testing cases, thanks! |
Updated testing result: DataFusion CLI v45.0.0
> create table foo(x int, y int) as values (1,2), (3,4);
0 row(s) fetched.
Elapsed 0.026 seconds.
> explain select * from foo where x = 4;
+---------------+-------------------------------------------------------+
| plan_type | plan |
+---------------+-------------------------------------------------------+
| logical_plan | Filter: foo.x = Int32(4) |
| | TableScan: foo projection=[x, y] |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: x@0 = 4 |
| | DataSourceExec: partitions=1, partition_sizes=[1] |
+---------------+-------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.022 seconds.
> explain verbose select * from foo where x = 4;
+------------------------------------------------------------+-------------------------------------------+
| plan_type | plan |
+------------------------------------------------------------+-------------------------------------------+
| initial_logical_plan | Projection: * |
| | Filter: foo.x = Int64(4) |
| | TableScan: foo |
| logical_plan after inline_table_scan | SAME TEXT AS ABOVE |
| logical_plan after expand_wildcard_rule | Projection: foo.x, foo.y |
| | Filter: foo.x = Int64(4) |
| | TableScan: foo |
| logical_plan after resolve_grouping_function | SAME TEXT AS ABOVE |
| logical_plan after type_coercion | Projection: foo.x, foo.y |
| | Filter: CAST(foo.x AS Int64) = Int64(4) |
| | TableScan: foo |
| analyzed_logical_plan | SAME TEXT AS ABOVE |
| logical_plan after eliminate_nested_union | SAME TEXT AS ABOVE |
| logical_plan after simplify_expressions | SAME TEXT AS ABOVE |
| logical_plan after unwrap_cast_in_comparison | Projection: foo.x, foo.y |
| | Filter: foo.x = Int32(4) |
| | TableScan: foo |
| logical_plan after replace_distinct_aggregate | SAME TEXT AS ABOVE |
| logical_plan after eliminate_join | SAME TEXT AS ABOVE |
| logical_plan after decorrelate_predicate_subquery | SAME TEXT AS ABOVE |
| logical_plan after scalar_subquery_to_join | SAME TEXT AS ABOVE |
| logical_plan after extract_equijoin_predicate | SAME TEXT AS ABOVE |
| logical_plan after eliminate_duplicated_expr | SAME TEXT AS ABOVE |
| logical_plan after eliminate_filter | SAME TEXT AS ABOVE |
| logical_plan after eliminate_cross_join | SAME TEXT AS ABOVE |
| logical_plan after common_sub_expression_eliminate | SAME TEXT AS ABOVE |
| logical_plan after eliminate_limit | SAME TEXT AS ABOVE |
| logical_plan after propagate_empty_relation | SAME TEXT AS ABOVE |
| logical_plan after eliminate_one_union | SAME TEXT AS ABOVE |
| logical_plan after filter_null_join_keys | SAME TEXT AS ABOVE |
| logical_plan after eliminate_outer_join | SAME TEXT AS ABOVE |
| logical_plan after push_down_limit | SAME TEXT AS ABOVE |
| logical_plan after push_down_filter | SAME TEXT AS ABOVE |
| logical_plan after single_distinct_aggregation_to_group_by | SAME TEXT AS ABOVE |
| logical_plan after simplify_expressions | SAME TEXT AS ABOVE |
| logical_plan after unwrap_cast_in_comparison | SAME TEXT AS ABOVE |
| logical_plan after common_sub_expression_eliminate | SAME TEXT AS ABOVE |
| logical_plan after eliminate_group_by_constant | SAME TEXT AS ABOVE |
| logical_plan after optimize_projections | Filter: foo.x = Int32(4) |
| | TableScan: foo projection=[x, y] |
| logical_plan after eliminate_nested_union | SAME TEXT AS ABOVE |
| logical_plan after simplify_expressions | SAME TEXT AS ABOVE |
| logical_plan after unwrap_cast_in_comparison | SAME TEXT AS ABOVE |
| logical_plan after replace_distinct_aggregate | SAME TEXT AS ABOVE |
| logical_plan after eliminate_join | SAME TEXT AS ABOVE |
| logical_plan after decorrelate_predicate_subquery | SAME TEXT AS ABOVE |
| logical_plan after scalar_subquery_to_join | SAME TEXT AS ABOVE |
| logical_plan after extract_equijoin_predicate | SAME TEXT AS ABOVE |
| logical_plan after eliminate_duplicated_expr | SAME TEXT AS ABOVE |
| . | . |
| . | . |
| . | . |
+------------------------------------------------------------+-------------------------------------------+
78 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 0.005 seconds. |
Thanks @zhuqi-lucas -- I'll try and check this out soon (likely tomorrow) |
datafusion-cli/src/print_format.rs
Outdated
@@ -209,14 +211,175 @@ impl PrintFormat { | |||
} | |||
Ok(()) | |||
} | |||
|
|||
#[allow(clippy::too_many_arguments)] | |||
pub fn process_batch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we now have some more time, I wonder if you could potentially consider refactoring this code now.
I found it somewhat hard to understand when I was trying to debug why the explain plans were not showing up reasonably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion-cli/src/print_options.rs
Outdated
@@ -127,9 +107,125 @@ impl PrintOptions { | |||
Ok(()) | |||
} | |||
|
|||
pub async fn print_table_batch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some comments here, as well as how it is different than print_batches
datafusion-cli/src/print_options.rs
Outdated
@@ -110,7 +90,7 @@ impl PrintOptions { | |||
self.format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I noticed when reviewing this code is that there are different passes for print_batches and print_table_batch
I wonder if there is some way to combine them (so we always use the same printing logic)
Thank you @alamb for review and good suggestion! I will try to address it today. |
I am continuing polish the code besides the #14886 which will add the streaming state struct. I believe the code is more readable now. |
/// | ||
/// The query_start_time is used to calculate the elapsed time for the query. | ||
/// The schema is used to print the header. | ||
pub async fn print_stream( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unified to one API for all cases.
@@ -51,6 +51,163 @@ fn init() { | |||
["--command", "show datafusion.execution.batch_size", "--format", "json", "-q", "-b", "1"], | |||
"[{\"name\":\"datafusion.execution.batch_size\",\"value\":\"1\"}]\n" | |||
)] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add rich testing cases in cli_integration, it can avoid regression for datafusion-cli changes.
Which issue does this PR close?
Initial PR:
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
Fix the explain format