Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Improve join performance for new-streaming engine #21620

Merged
merged 25 commits into from
Mar 7, 2025

Conversation

orlp
Copy link
Collaborator

@orlp orlp commented Mar 6, 2025

This changes our representation from many pre-partitioned small dataframes on the build side to unpartitioned morsels + partition indexes, after which we do a precise reserve for the payload size and build one contiguous payload chunk per partition. This reduces a lot of pressure on the allocator, and is much more sensible for view types keeping references into the build-side payload buffer without endless deduping or memcpy'ing.

@orlp orlp requested review from ritchie46 and c-peters as code owners March 6, 2025 08:53
@orlp orlp marked this pull request as draft March 6, 2025 08:53
@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Mar 6, 2025
Copy link

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 36.17300% with 487 lines in your changes missing coverage. Please review.

Project coverage is 80.49%. Comparing base (40c7019) to head (0abac7b).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-stream/src/nodes/joins/equi_join.rs 42.12% 272 Missing ⚠️
...es/polars-core/src/chunked_array/object/builder.rs 0.00% 75 Missing ⚠️
crates/polars-expr/src/hash_keys.rs 9.09% 40 Missing ⚠️
crates/polars-arrow/src/bitmap/builder.rs 0.00% 39 Missing ⚠️
crates/polars-expr/src/idx_table/row_encoded.rs 44.61% 36 Missing ⚠️
crates/polars-utils/src/sparse_init_vec.rs 74.07% 14 Missing ⚠️
crates/polars-arrow/src/array/primitive/builder.rs 0.00% 6 Missing ⚠️
...s/polars-core/src/chunked_array/object/registry.rs 0.00% 3 Missing ⚠️
crates/polars-core/src/series/builder.rs 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #21620      +/-   ##
==========================================
- Coverage   80.56%   80.49%   -0.08%     
==========================================
  Files        1604     1604              
  Lines      231642   231678      +36     
  Branches     2650     2650              
==========================================
- Hits       186615   186481     -134     
- Misses      44410    44580     +170     
  Partials      617      617              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@orlp orlp force-pushed the efficient-join branch from 31b6c22 to 530965e Compare March 6, 2025 12:27
@orlp orlp force-pushed the efficient-join branch from 5e7b839 to dc9da50 Compare March 7, 2025 09:18
@orlp orlp marked this pull request as ready for review March 7, 2025 16:17
@ritchie46 ritchie46 merged commit 95b02fe into pola-rs:main Mar 7, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants