Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add tree / pretty explain mode #14677

Merged
merged 23 commits into from
Mar 4, 2025
Merged

feat: Add tree / pretty explain mode #14677

merged 23 commits into from
Mar 4, 2025

Conversation

irenjj
Copy link
Contributor

@irenjj irenjj commented Feb 15, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Prettify explain display format like DuckDB.

In this pr, implemented TreeRender structure, and DataSourceExec info:

> explain select * from t1  order by i;

+---------------+--------------------------------+
| plan_type     | plan                           |
+---------------+--------------------------------+
| logical_plan  | Sort: t1.i ASC NULLS LAST      |
|               |   TableScan: t1 projection=[i] |
| physical_plan | ┌───────────────────────────┐  |
|               | │          SortExec         │  |
|               | └─────────────┬─────────────┘  |
|               | ┌─────────────┴─────────────┐  |
|               | │       DataSourceExec      │  |
|               | │    --------------------   │  |
|               | │    partition_sizes: [0]   │  |
|               | │       partitions: 1       │  |
|               | └───────────────────────────┘  |
|               |                                |
+---------------+--------------------------------+
2 row(s) fetched. 
Elapsed 0.005 seconds.

> explain select * from t1 inner join t2 on t1.i=t2.i;

+---------------+------------------------------------------------------------+
| plan_type     | plan                                                       |
+---------------+------------------------------------------------------------+
| logical_plan  | Inner Join: t1.i = t2.i                                    |
|               |   TableScan: t1 projection=[i]                             |
|               |   TableScan: t2 projection=[i]                             |
| physical_plan | ┌───────────────────────────┐                              |
|               | │    CoalesceBatchesExec    │                              |
|               | └─────────────┬─────────────┘                              |
|               | ┌─────────────┴─────────────┐                              |
|               | │        HashJoinExec       ├──────────────┐               |
|               | └─────────────┬─────────────┘              │               |
|               | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      ││       DataSourceExec      │ |
|               | │    --------------------   ││    --------------------   │ |
|               | │    partition_sizes: [0]   ││       partitions: 1       │ |
|               | │       partitions: 1       ││    partition_sizes: [0]   │ |
|               | └───────────────────────────┘└───────────────────────────┘ |
|               |                                                            |
+---------------+------------------------------------------------------------+
2 row(s) fetched.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate common Related to common crate labels Feb 15, 2025
@alamb
Copy link
Contributor

alamb commented Feb 15, 2025

😍

I have been dreaming about this @irenjj -- can't wait to check it out

@xudong963
Copy link
Member

It's my lovest todo:
image

Thank you for making it happen!

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 23, 2025
@irenjj irenjj marked this pull request as ready for review February 23, 2025 13:30
@irenjj
Copy link
Contributor Author

irenjj commented Feb 23, 2025

PTAL @alamb @xudong963 👀

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this happen.

Maybe it's better to open a tracking issue to record next todos, such as rich other physical operator's information to display. Then we can make our new explain perfect in parallel after the PR is merged.

@irenjj
Copy link
Contributor Author

irenjj commented Feb 24, 2025

Happy to see this happen.

Maybe it's better to open a tracking issue to record next todos, such as rich other physical operator's information to display. Then we can make our new explain perfect in parallel after the PR is merged.

I will do it later today.❤️

@alamb
Copy link
Contributor

alamb commented Feb 24, 2025

Looks awesome -- I have put this on my review queue. So exciting

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool 💯

I suggest to add more tests to make sure the tree rendering is correct for complex plans:

  1. Very long metric entry like partition_sizes: [20, 20, 20, 30, 20, 20, 20...]
  2. Complex query: perhaps pick some from TPCH and make sure the tree structure is rendered correct.

Then we can write instructions and open tickets to support pretty explain for other executor nodes. I think now the default explain info is very detailed debug-style messages, perhaps we can make this pretty explain more concise (by looking at what metrics DuckDB includes for a certain plan node 😄 )

@@ -43,6 +44,7 @@ pub trait DataSource: Send + Sync {
) -> datafusion_common::Result<SendableRecordBatchStream>;
fn as_any(&self) -> &dyn Any;
fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter) -> fmt::Result;
fn collect_info(&self) -> HashMap<String, String>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new interface to collect info for pretty explain? Looks like it's possible to implement inside fmt_as().
DuckDB explains also have many non key-value patterns like the following example, we will want the formater to be more flexible.

┌─────────────┴─────────────┐
│       READ_PARQUET        │
│    ────────────────────   │
│         Function:         │
│        READ_PARQUET       │
│                           │
│        Projections:       │
│         l_orderkey        │
│         l_partkey         │
│         l_suppkey         │
│        l_linenumber       │
│         l_quantity        │
│      l_extendedprice      │
│                           │
│       ~6046496 Rows       │
└───────────────────────────┘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this -- it would be much better to avoid a new interface and instead update the existing fmt_as API

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 25, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @irenjj

I took the liberty of merging up from main to resolve a conflict and fixing the CI failure

I am still going through this PR but so far I think it looks great

@@ -704,6 +704,10 @@ config_namespace! {

/// When set to true, the explain statement will print schema information
pub show_schema: bool, default = false

/// Display format of explain. Default is "indent".
/// When set to "pretty", it will print the plan in a tree-rendered format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than "pretty" that prints out a tree I suggest we change the option name to "tree"


use crate::ExecutionPlan;

#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still deadcode? it doesn't look like it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will warn while cargo build:

warning: fields `x` and `y` are never read
  --> datafusion/physical-plan/src/render_tree.rs:29:9
   |
28 | pub struct Coordinate {
   |            ---------- fields in this struct
29 |     pub x: usize,
   |         ^
30 |     pub y: usize,
   |         ^
   |
   = note: `#[warn(dead_code)]` on by default

the Coordinate is inspired from duckdb's implementation, I'll figure out if it's necessary to introduce such a structure.

@github-actions github-actions bot added the optimizer Optimizer rules label Feb 26, 2025
@alamb
Copy link
Contributor

alamb commented Feb 27, 2025

I hope to look at this later today

Looks like the follow on ticket is filed.

@alamb alamb changed the title feat: tree / pretty explain feat: Add tree / pretty explain mode Feb 28, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great @irenjj

Thanks so much 🙏

I merged up from main and will touch up the ticket you filed

@github-actions github-actions bot added datasource Changes to the datasource crate ffi Changes to the ffi crate and removed physical-expr Changes to the physical-expr crates labels Mar 3, 2025
@alamb
Copy link
Contributor

alamb commented Mar 3, 2025

This is next on my list of things to merge

What i would like ot do is merge this PR in and then file follow on subtasks in

I am sorry I have been slammed trying to get the DataFusion 46 release out

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged up from main to resolve some conflicts in this PR

@alamb
Copy link
Contributor

alamb commented Mar 4, 2025

🚀 thank you again so much @irenjj -- this is so cool. A great step towards better explain plans.

I am working on enhancing the follow on ticket you filed in #14914 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation ffi Changes to the ffi crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Nicer / DuckDB style explain plans
5 participants