Skip to content

docs: improve documentation of prudence argument #589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

maelle
Copy link
Collaborator

@maelle maelle commented Feb 6, 2025

Fix #508
And some other suggested changes for readability.

@maelle maelle requested a review from krlmlr February 6, 2025 09:55
@@ -30,22 +30,19 @@ knitr::opts_chunk$set(
Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 0)
```

Unlike traditional data frames, duckplyr defers computation until absolutely necessary, allowing DuckDB to optimize execution.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is explained better in one of the sections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but is this sentence harmful here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be discouraging to have a first sentence that's "complicated".

Thrifty duckplyr frames are a compromise between lavish and frugal, discussed further below.
Thrifty duckplyr frames are a compromise between lavish and frugal, discussed below.

### Thrift
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the previous location of this section a bit jarring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to give the readers time to digest "lavish" and "frugal".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to me when reading was that I saw it would be discussed later and thought, ok then. But when the "Thrift" section appeared, it felt misplaced.

@@ -198,7 +233,7 @@ flights_frugal[[1]]
```


### Enforcing DuckDB operation
### Side effect: Enforcing DuckDB operation
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an important section but also does not go with the flow of the rest of the vignette.

Copy link
Member

@krlmlr krlmlr Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moving to "fallback".

Later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR where I edit the fallback vignette, I added a section about this but that refers to this vignette.

Copy link
Contributor

github-actions bot commented Feb 6, 2025

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 00b4546 is merged into main:

  • ✔️001_tpch_01: 24.2ms -> 24.3ms [-3.94%, +5.03%]
  • ✔️001_tpch_02: 66ms -> 64.9ms [-4.92%, +1.72%]
  • ✔️001_tpch_03: 39ms -> 38.9ms [-1.3%, +0.92%]
  • ✔️001_tpch_04: 27.6ms -> 27.7ms [-1.03%, +1.79%]
  • ✔️001_tpch_05: 57.8ms -> 57.5ms [-2.09%, +1.05%]
  • ✔️001_tpch_06: 11.6ms -> 11.5ms [-3.46%, +3.17%]
  • ❗🐌001_tpch_07: 70.5ms -> 71.3ms [+0.06%, +2.31%]
  • ✔️001_tpch_08: 98.1ms -> 97.8ms [-1.66%, +0.95%]
  • ✔️001_tpch_09: 70.4ms -> 70.9ms [-1.24%, +2.6%]
  • ✔️001_tpch_10: 47.2ms -> 47.4ms [-0.54%, +1.17%]
  • ✔️001_tpch_11: 37.7ms -> 37.6ms [-1.89%, +1.42%]
  • ✔️001_tpch_12: 21.1ms -> 21ms [-2.12%, +1.67%]
  • ✔️001_tpch_13: 18.2ms -> 18.5ms [-1.14%, +4.89%]
  • ✔️001_tpch_14: 16.7ms -> 16.7ms [-1.52%, +1.67%]
  • ✔️001_tpch_15: 36.1ms -> 36.2ms [-1.35%, +2.29%]
  • ✔️001_tpch_16: 34.2ms -> 34.4ms [-0.43%, +1.7%]
  • ✔️001_tpch_17: 30.8ms -> 30.6ms [-2.18%, +0.95%]
  • ✔️001_tpch_18: 27.8ms -> 27.7ms [-2.02%, +1.68%]
  • ✔️001_tpch_19: 38ms -> 38ms [-0.54%, +0.95%]
  • ✔️001_tpch_20: 49.8ms -> 49.9ms [-0.63%, +1.35%]
  • ✔️001_tpch_21: 78.7ms -> 78.7ms [-1.46%, +1.51%]
  • ✔️001_tpch_22: 45.9ms -> 45.9ms [-0.65%, +0.75%]
  • ✔️010_tpch_01: 84.4ms -> 85.3ms [-2.31%, +4.37%]
  • ✔️010_tpch_02: 70.9ms -> 70.4ms [-2.49%, +0.85%]
  • ✔️010_tpch_03: 57.4ms -> 58.7ms [-2.05%, +6.49%]
  • ✔️010_tpch_04: 42.3ms -> 43.4ms [-2.71%, +7.55%]
  • ✔️010_tpch_05: 89.1ms -> 87.8ms [-3.6%, +0.61%]
  • ✔️010_tpch_06: 29.5ms -> 29.2ms [-9.24%, +7.28%]
  • ✔️010_tpch_07: 101ms -> 101ms [-2.41%, +1.11%]
  • ✔️010_tpch_08: 126ms -> 127ms [-2.34%, +2.88%]
  • ✔️010_tpch_09: 117ms -> 115ms [-3.73%, +0.2%]
  • ✔️010_tpch_10: 75.3ms -> 74.7ms [-2.21%, +0.58%]
  • ✔️010_tpch_11: 39.4ms -> 39.7ms [-2.71%, +4.44%]
  • ✔️010_tpch_12: 50.7ms -> 50.8ms [-1.02%, +1.45%]
  • ✔️010_tpch_13: 50.8ms -> 51.4ms [-1.75%, +4.3%]
  • ✔️010_tpch_14: 35ms -> 36.1ms [-5.35%, +12.01%]
  • ✔️010_tpch_15: 54.1ms -> 54.2ms [-5.98%, +6.18%]
  • ✔️010_tpch_16: 41.6ms -> 40.1ms [-8.73%, +1.63%]
  • ✔️010_tpch_17: 56.9ms -> 54.9ms [-10.65%, +3.75%]
  • ✔️010_tpch_18: 56.2ms -> 55.3ms [-7.75%, +4.57%]
  • ✔️010_tpch_19: 87.1ms -> 85.7ms [-3.46%, +0.29%]
  • ✔️010_tpch_20: 65.2ms -> 65.4ms [-2.38%, +2.9%]
  • ✔️010_tpch_21: 253ms -> 252ms [-4.82%, +3.6%]
  • ✔️010_tpch_22: 54.4ms -> 55.2ms [-1.22%, +4.26%]
  • ✔️100_tpch_01: 351ms -> 342ms [-19.66%, +14.69%]
  • ✔️100_tpch_02: 123ms -> 128ms [-3.47%, +11.63%]
  • ✔️100_tpch_03: 174ms -> 169ms [-8.62%, +3.02%]
  • 🚀100_tpch_04: 150ms -> 146ms [-4.2%, -1.17%]
  • ✔️100_tpch_05: 268ms -> 267ms [-10.73%, +9.83%]
  • ✔️100_tpch_06: 99.3ms -> 93.3ms [-23.4%, +11.32%]
  • ✔️100_tpch_07: 224ms -> 231ms [-9.51%, +15.79%]
  • ✔️100_tpch_08: 262ms -> 253ms [-16.63%, +9.17%]
  • ✔️100_tpch_09: 329ms -> 329ms [-18.19%, +17.68%]
  • ✔️100_tpch_10: 218ms -> 210ms [-9.5%, +1.89%]
  • ✔️100_tpch_11: 84ms -> 93.9ms [-32.49%, +56.16%]
  • ✔️100_tpch_12: 173ms -> 178ms [-7.56%, +13.88%]
  • ✔️100_tpch_13: 313ms -> 327ms [-0.04%, +9.06%]
  • ✔️100_tpch_14: 110ms -> 111ms [-11.12%, +13.29%]
  • ✔️100_tpch_15: 207ms -> 203ms [-10.16%, +6.08%]
  • ✔️100_tpch_16: 120ms -> 121ms [-6.39%, +9.05%]
  • ✔️100_tpch_17: 176ms -> 176ms [-10.92%, +10.95%]
  • ✔️100_tpch_18: 194ms -> 189ms [-14.23%, +8.99%]
  • ✔️100_tpch_19: 269ms -> 270ms [-17%, +17.6%]
  • ✔️100_tpch_20: 175ms -> 182ms [-6.73%, +14.48%]
  • ✔️100_tpch_21: 1.37s -> 1.33s [-5.17%, +0.55%]
  • ✔️100_tpch_22: 156ms -> 151ms [-13.54%, +7.83%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@krlmlr
Copy link
Member

krlmlr commented Feb 7, 2025

This would be easier to review in two PRs. Will split and review individually.

@krlmlr
Copy link
Member

krlmlr commented Feb 7, 2025

Other commit now in #593.

@krlmlr krlmlr marked this pull request as draft February 7, 2025 03:23
@krlmlr krlmlr mentioned this pull request Feb 7, 2025
Copy link
Contributor

github-actions bot commented Feb 7, 2025

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 3385047 is merged into main:

  • ✔️001_tpch_01: 23.9ms -> 23.8ms [-5.94%, +5.23%]
  • ✔️001_tpch_02: 62.9ms -> 63.1ms [-0.45%, +1.1%]
  • ✔️001_tpch_03: 38.8ms -> 39.2ms [-0.69%, +2.85%]
  • ✔️001_tpch_04: 28.3ms -> 28.1ms [-4.24%, +2.75%]
  • ✔️001_tpch_05: 56.2ms -> 56.2ms [-0.74%, +0.74%]
  • ✔️001_tpch_06: 11.4ms -> 11.3ms [-4.78%, +1.75%]
  • ✔️001_tpch_07: 70.4ms -> 70.1ms [-1.28%, +0.37%]
  • ✔️001_tpch_08: 96.9ms -> 96.4ms [-1.9%, +0.87%]
  • ✔️001_tpch_09: 67.1ms -> 67.4ms [-0.53%, +1.55%]
  • ✔️001_tpch_10: 46.8ms -> 47ms [-0.8%, +1.48%]
  • ✔️001_tpch_11: 37.8ms -> 37.7ms [-1.35%, +0.54%]
  • ✔️001_tpch_12: 20.9ms -> 20.9ms [-3.64%, +3.32%]
  • ✔️001_tpch_13: 18.3ms -> 17.8ms [-5.64%, +0.77%]
  • ✔️001_tpch_14: 16.5ms -> 16.5ms [-2.29%, +2.44%]
  • ✔️001_tpch_15: 36.2ms -> 36.1ms [-2.11%, +1.48%]
  • ✔️001_tpch_16: 34.2ms -> 34.3ms [-1.22%, +2.04%]
  • ✔️001_tpch_17: 30.4ms -> 30.8ms [-0.47%, +3.29%]
  • ✔️001_tpch_18: 28.2ms -> 27.9ms [-2.85%, +0.65%]
  • ✔️001_tpch_19: 37.5ms -> 37.8ms [-0.18%, +2.23%]
  • ✔️001_tpch_20: 49.4ms -> 49.2ms [-1.53%, +0.9%]
  • ✔️001_tpch_21: 78.4ms -> 78.5ms [-0.92%, +1.18%]
  • ✔️001_tpch_22: 45.5ms -> 45.5ms [-0.88%, +1.1%]
  • ✔️010_tpch_01: 86.4ms -> 83.4ms [-9.63%, +2.8%]
  • ✔️010_tpch_02: 69.9ms -> 69.9ms [-1.36%, +1.39%]
  • ✔️010_tpch_03: 58.2ms -> 56.4ms [-7.09%, +0.87%]
  • ✔️010_tpch_04: 41.6ms -> 42.9ms [-1.45%, +7.5%]
  • ✔️010_tpch_05: 87.8ms -> 87.7ms [-1.31%, +1.06%]
  • ✔️010_tpch_06: 29.2ms -> 28.4ms [-12.99%, +7.23%]
  • ✔️010_tpch_07: 101ms -> 101ms [-5.01%, +3.63%]
  • ✔️010_tpch_08: 124ms -> 127ms [-1.02%, +4.87%]
  • ✔️010_tpch_09: 114ms -> 116ms [-0.52%, +2.61%]
  • ✔️010_tpch_10: 73.4ms -> 73.3ms [-1.66%, +1.39%]
  • ✔️010_tpch_11: 37.8ms -> 38.5ms [-2.27%, +5.96%]
  • ✔️010_tpch_12: 49.1ms -> 50.5ms [-0.42%, +6.11%]
  • ✔️010_tpch_13: 49.3ms -> 49.3ms [-1.43%, +1.82%]
  • ✔️010_tpch_14: 33.2ms -> 33.4ms [-2.15%, +3.04%]
  • ❗🐌010_tpch_15: 49.9ms -> 54.1ms [+1.98%, +15.17%]
  • ✔️010_tpch_16: 40.4ms -> 38.9ms [-8.55%, +1.17%]
  • ✔️010_tpch_17: 53.4ms -> 53.1ms [-5.23%, +4.2%]
  • ✔️010_tpch_18: 56ms -> 52.6ms [-12.56%, +0.34%]
  • ✔️010_tpch_19: 82.9ms -> 82.5ms [-1.23%, +0.34%]
  • ✔️010_tpch_20: 62.9ms -> 62.9ms [-2.7%, +2.64%]
  • ✔️010_tpch_21: 242ms -> 243ms [-1.24%, +2.13%]
  • ✔️010_tpch_22: 52.4ms -> 53.1ms [-0.9%, +3.58%]
  • ✔️100_tpch_01: 352ms -> 350ms [-20.1%, +19.22%]
  • ✔️100_tpch_02: 124ms -> 125ms [-2.06%, +3.31%]
  • ✔️100_tpch_03: 165ms -> 166ms [-8.99%, +10.19%]
  • ✔️100_tpch_04: 146ms -> 143ms [-12.28%, +8.46%]
  • ✔️100_tpch_05: 242ms -> 260ms [-4.52%, +19.43%]
  • ✔️100_tpch_06: 97ms -> 88.8ms [-18.76%, +1.83%]
  • ✔️100_tpch_07: 220ms -> 221ms [-7.56%, +8.83%]
  • ✔️100_tpch_08: 257ms -> 260ms [-5.82%, +7.71%]
  • ✔️100_tpch_09: 343ms -> 337ms [-15.8%, +12.64%]
  • ✔️100_tpch_10: 208ms -> 208ms [-5.73%, +5.27%]
  • ✔️100_tpch_11: 81.7ms -> 84.7ms [-4.21%, +11.33%]
  • ✔️100_tpch_12: 170ms -> 177ms [-5.09%, +13.27%]
  • ✔️100_tpch_13: 301ms -> 295ms [-8.79%, +4.75%]
  • ✔️100_tpch_14: 117ms -> 119ms [-11.6%, +14.32%]
  • ✔️100_tpch_15: 204ms -> 206ms [-18.03%, +19.56%]
  • ✔️100_tpch_16: 117ms -> 119ms [-8.67%, +11.46%]
  • ✔️100_tpch_17: 175ms -> 177ms [-16.06%, +18.55%]
  • ✔️100_tpch_18: 205ms -> 216ms [-7.19%, +17.45%]
  • ✔️100_tpch_19: 267ms -> 265ms [-10.6%, +8.76%]
  • ✔️100_tpch_20: 166ms -> 168ms [-8.92%, +11.86%]
  • ✔️100_tpch_21: 1.36s -> 1.34s [-11.14%, +8.7%]
  • ✔️100_tpch_22: 157ms -> 156ms [-16.42%, +15.37%]

Further explanation regarding interpretation and methodology can be found in the documentation.

@krlmlr krlmlr force-pushed the prudence-rmd branch 2 times, most recently from f404f17 to 3f29cba Compare February 7, 2025 04:11
Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping open for discussion.

@@ -30,22 +30,19 @@ knitr::opts_chunk$set(
Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 0)
```

Unlike traditional data frames, duckplyr defers computation until absolutely necessary, allowing DuckDB to optimize execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but is this sentence harmful here?


- **lazy materialization**: Unlike traditional data frames, duckplyr defers computation until absolutely necessary, i.e. lazily, allowing DuckDB to optimize execution.
- **prudence**: Automatic materialization is controllable, as automatic materialization of large data might otherwise inadvertently lead to memory problems.

The term "prudence" is introduced here to set a clear distinction from the concept of "laziness", and because "control of automatic materialization" is a mouthful.

## Eager and lazy computation
## DuckDB optimization: lazy evaluation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lazy evaluation" is easy to confuse with how R handles arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, this should be "lazy computation", sorry.

Thrifty duckplyr frames are a compromise between lavish and frugal, discussed further below.
Thrifty duckplyr frames are a compromise between lavish and frugal, discussed below.

### Thrift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to give the readers time to digest "lavish" and "frugal".

@@ -198,7 +233,7 @@ flights_frugal[[1]]
```


### Enforcing DuckDB operation
### Side effect: Enforcing DuckDB operation
Copy link
Member

@krlmlr krlmlr Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moving to "fallback".

Later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add "memory protection" somewhere near definition of funneling
2 participants