docs: improve documentation of `prudence` argument #589

maelle · 2025-02-06T09:55:06Z

Fix #508
And some other suggested changes for readability.

maelle · 2025-02-06T09:55:58Z

vignettes/prudence.Rmd

@@ -30,22 +30,19 @@ knitr::opts_chunk$set(
 Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 0)
 ```

-Unlike traditional data frames, duckplyr defers computation until absolutely necessary, allowing DuckDB to optimize execution.


this is explained better in one of the sections.

Yes, but is this sentence harmful here?

It could be discouraging to have a first sentence that's "complicated".

maelle · 2025-02-06T09:57:47Z

vignettes/prudence.Rmd

-Thrifty duckplyr frames are a compromise between lavish and frugal, discussed further below.
+Thrifty duckplyr frames are a compromise between lavish and frugal, discussed below.
+
+### Thrift


I found the previous location of this section a bit jarring.

I want to give the readers time to digest "lavish" and "frugal".

What happened to me when reading was that I saw it would be discussed later and thought, ok then. But when the "Thrift" section appeared, it felt misplaced.

maelle · 2025-02-06T09:59:14Z

vignettes/prudence.Rmd

@@ -198,7 +233,7 @@ flights_frugal[[1]]
 ```


-### Enforcing DuckDB operation
+### Side effect: Enforcing DuckDB operation


It is an important section but also does not go with the flow of the rest of the vignette.

Good point, moving to "fallback".

Later.

In the PR where I edit the fallback vignette, I added a section about this but that refers to this vignette.

vignettes/prudence.Rmd

github-actions · 2025-02-06T11:14:12Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 00b4546 is merged into main:

✔️001_tpch_01: 24.2ms -> 24.3ms [-3.94%, +5.03%]
✔️001_tpch_02: 66ms -> 64.9ms [-4.92%, +1.72%]
✔️001_tpch_03: 39ms -> 38.9ms [-1.3%, +0.92%]
✔️001_tpch_04: 27.6ms -> 27.7ms [-1.03%, +1.79%]
✔️001_tpch_05: 57.8ms -> 57.5ms [-2.09%, +1.05%]
✔️001_tpch_06: 11.6ms -> 11.5ms [-3.46%, +3.17%]
❗🐌001_tpch_07: 70.5ms -> 71.3ms [+0.06%, +2.31%]
✔️001_tpch_08: 98.1ms -> 97.8ms [-1.66%, +0.95%]
✔️001_tpch_09: 70.4ms -> 70.9ms [-1.24%, +2.6%]
✔️001_tpch_10: 47.2ms -> 47.4ms [-0.54%, +1.17%]
✔️001_tpch_11: 37.7ms -> 37.6ms [-1.89%, +1.42%]
✔️001_tpch_12: 21.1ms -> 21ms [-2.12%, +1.67%]
✔️001_tpch_13: 18.2ms -> 18.5ms [-1.14%, +4.89%]
✔️001_tpch_14: 16.7ms -> 16.7ms [-1.52%, +1.67%]
✔️001_tpch_15: 36.1ms -> 36.2ms [-1.35%, +2.29%]
✔️001_tpch_16: 34.2ms -> 34.4ms [-0.43%, +1.7%]
✔️001_tpch_17: 30.8ms -> 30.6ms [-2.18%, +0.95%]
✔️001_tpch_18: 27.8ms -> 27.7ms [-2.02%, +1.68%]
✔️001_tpch_19: 38ms -> 38ms [-0.54%, +0.95%]
✔️001_tpch_20: 49.8ms -> 49.9ms [-0.63%, +1.35%]
✔️001_tpch_21: 78.7ms -> 78.7ms [-1.46%, +1.51%]
✔️001_tpch_22: 45.9ms -> 45.9ms [-0.65%, +0.75%]
✔️010_tpch_01: 84.4ms -> 85.3ms [-2.31%, +4.37%]
✔️010_tpch_02: 70.9ms -> 70.4ms [-2.49%, +0.85%]
✔️010_tpch_03: 57.4ms -> 58.7ms [-2.05%, +6.49%]
✔️010_tpch_04: 42.3ms -> 43.4ms [-2.71%, +7.55%]
✔️010_tpch_05: 89.1ms -> 87.8ms [-3.6%, +0.61%]
✔️010_tpch_06: 29.5ms -> 29.2ms [-9.24%, +7.28%]
✔️010_tpch_07: 101ms -> 101ms [-2.41%, +1.11%]
✔️010_tpch_08: 126ms -> 127ms [-2.34%, +2.88%]
✔️010_tpch_09: 117ms -> 115ms [-3.73%, +0.2%]
✔️010_tpch_10: 75.3ms -> 74.7ms [-2.21%, +0.58%]
✔️010_tpch_11: 39.4ms -> 39.7ms [-2.71%, +4.44%]
✔️010_tpch_12: 50.7ms -> 50.8ms [-1.02%, +1.45%]
✔️010_tpch_13: 50.8ms -> 51.4ms [-1.75%, +4.3%]
✔️010_tpch_14: 35ms -> 36.1ms [-5.35%, +12.01%]
✔️010_tpch_15: 54.1ms -> 54.2ms [-5.98%, +6.18%]
✔️010_tpch_16: 41.6ms -> 40.1ms [-8.73%, +1.63%]
✔️010_tpch_17: 56.9ms -> 54.9ms [-10.65%, +3.75%]
✔️010_tpch_18: 56.2ms -> 55.3ms [-7.75%, +4.57%]
✔️010_tpch_19: 87.1ms -> 85.7ms [-3.46%, +0.29%]
✔️010_tpch_20: 65.2ms -> 65.4ms [-2.38%, +2.9%]
✔️010_tpch_21: 253ms -> 252ms [-4.82%, +3.6%]
✔️010_tpch_22: 54.4ms -> 55.2ms [-1.22%, +4.26%]
✔️100_tpch_01: 351ms -> 342ms [-19.66%, +14.69%]
✔️100_tpch_02: 123ms -> 128ms [-3.47%, +11.63%]
✔️100_tpch_03: 174ms -> 169ms [-8.62%, +3.02%]
🚀100_tpch_04: 150ms -> 146ms [-4.2%, -1.17%]
✔️100_tpch_05: 268ms -> 267ms [-10.73%, +9.83%]
✔️100_tpch_06: 99.3ms -> 93.3ms [-23.4%, +11.32%]
✔️100_tpch_07: 224ms -> 231ms [-9.51%, +15.79%]
✔️100_tpch_08: 262ms -> 253ms [-16.63%, +9.17%]
✔️100_tpch_09: 329ms -> 329ms [-18.19%, +17.68%]
✔️100_tpch_10: 218ms -> 210ms [-9.5%, +1.89%]
✔️100_tpch_11: 84ms -> 93.9ms [-32.49%, +56.16%]
✔️100_tpch_12: 173ms -> 178ms [-7.56%, +13.88%]
✔️100_tpch_13: 313ms -> 327ms [-0.04%, +9.06%]
✔️100_tpch_14: 110ms -> 111ms [-11.12%, +13.29%]
✔️100_tpch_15: 207ms -> 203ms [-10.16%, +6.08%]
✔️100_tpch_16: 120ms -> 121ms [-6.39%, +9.05%]
✔️100_tpch_17: 176ms -> 176ms [-10.92%, +10.95%]
✔️100_tpch_18: 194ms -> 189ms [-14.23%, +8.99%]
✔️100_tpch_19: 269ms -> 270ms [-17%, +17.6%]
✔️100_tpch_20: 175ms -> 182ms [-6.73%, +14.48%]
✔️100_tpch_21: 1.37s -> 1.33s [-5.17%, +0.55%]
✔️100_tpch_22: 156ms -> 151ms [-13.54%, +7.83%]

Further explanation regarding interpretation and methodology can be found in the documentation.

krlmlr · 2025-02-07T02:36:06Z

This would be easier to review in two PRs. Will split and review individually.

krlmlr · 2025-02-07T02:44:00Z

Other commit now in #593.

github-actions · 2025-02-07T03:58:52Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 3385047 is merged into main:

✔️001_tpch_01: 23.9ms -> 23.8ms [-5.94%, +5.23%]
✔️001_tpch_02: 62.9ms -> 63.1ms [-0.45%, +1.1%]
✔️001_tpch_03: 38.8ms -> 39.2ms [-0.69%, +2.85%]
✔️001_tpch_04: 28.3ms -> 28.1ms [-4.24%, +2.75%]
✔️001_tpch_05: 56.2ms -> 56.2ms [-0.74%, +0.74%]
✔️001_tpch_06: 11.4ms -> 11.3ms [-4.78%, +1.75%]
✔️001_tpch_07: 70.4ms -> 70.1ms [-1.28%, +0.37%]
✔️001_tpch_08: 96.9ms -> 96.4ms [-1.9%, +0.87%]
✔️001_tpch_09: 67.1ms -> 67.4ms [-0.53%, +1.55%]
✔️001_tpch_10: 46.8ms -> 47ms [-0.8%, +1.48%]
✔️001_tpch_11: 37.8ms -> 37.7ms [-1.35%, +0.54%]
✔️001_tpch_12: 20.9ms -> 20.9ms [-3.64%, +3.32%]
✔️001_tpch_13: 18.3ms -> 17.8ms [-5.64%, +0.77%]
✔️001_tpch_14: 16.5ms -> 16.5ms [-2.29%, +2.44%]
✔️001_tpch_15: 36.2ms -> 36.1ms [-2.11%, +1.48%]
✔️001_tpch_16: 34.2ms -> 34.3ms [-1.22%, +2.04%]
✔️001_tpch_17: 30.4ms -> 30.8ms [-0.47%, +3.29%]
✔️001_tpch_18: 28.2ms -> 27.9ms [-2.85%, +0.65%]
✔️001_tpch_19: 37.5ms -> 37.8ms [-0.18%, +2.23%]
✔️001_tpch_20: 49.4ms -> 49.2ms [-1.53%, +0.9%]
✔️001_tpch_21: 78.4ms -> 78.5ms [-0.92%, +1.18%]
✔️001_tpch_22: 45.5ms -> 45.5ms [-0.88%, +1.1%]
✔️010_tpch_01: 86.4ms -> 83.4ms [-9.63%, +2.8%]
✔️010_tpch_02: 69.9ms -> 69.9ms [-1.36%, +1.39%]
✔️010_tpch_03: 58.2ms -> 56.4ms [-7.09%, +0.87%]
✔️010_tpch_04: 41.6ms -> 42.9ms [-1.45%, +7.5%]
✔️010_tpch_05: 87.8ms -> 87.7ms [-1.31%, +1.06%]
✔️010_tpch_06: 29.2ms -> 28.4ms [-12.99%, +7.23%]
✔️010_tpch_07: 101ms -> 101ms [-5.01%, +3.63%]
✔️010_tpch_08: 124ms -> 127ms [-1.02%, +4.87%]
✔️010_tpch_09: 114ms -> 116ms [-0.52%, +2.61%]
✔️010_tpch_10: 73.4ms -> 73.3ms [-1.66%, +1.39%]
✔️010_tpch_11: 37.8ms -> 38.5ms [-2.27%, +5.96%]
✔️010_tpch_12: 49.1ms -> 50.5ms [-0.42%, +6.11%]
✔️010_tpch_13: 49.3ms -> 49.3ms [-1.43%, +1.82%]
✔️010_tpch_14: 33.2ms -> 33.4ms [-2.15%, +3.04%]
❗🐌010_tpch_15: 49.9ms -> 54.1ms [+1.98%, +15.17%]
✔️010_tpch_16: 40.4ms -> 38.9ms [-8.55%, +1.17%]
✔️010_tpch_17: 53.4ms -> 53.1ms [-5.23%, +4.2%]
✔️010_tpch_18: 56ms -> 52.6ms [-12.56%, +0.34%]
✔️010_tpch_19: 82.9ms -> 82.5ms [-1.23%, +0.34%]
✔️010_tpch_20: 62.9ms -> 62.9ms [-2.7%, +2.64%]
✔️010_tpch_21: 242ms -> 243ms [-1.24%, +2.13%]
✔️010_tpch_22: 52.4ms -> 53.1ms [-0.9%, +3.58%]
✔️100_tpch_01: 352ms -> 350ms [-20.1%, +19.22%]
✔️100_tpch_02: 124ms -> 125ms [-2.06%, +3.31%]
✔️100_tpch_03: 165ms -> 166ms [-8.99%, +10.19%]
✔️100_tpch_04: 146ms -> 143ms [-12.28%, +8.46%]
✔️100_tpch_05: 242ms -> 260ms [-4.52%, +19.43%]
✔️100_tpch_06: 97ms -> 88.8ms [-18.76%, +1.83%]
✔️100_tpch_07: 220ms -> 221ms [-7.56%, +8.83%]
✔️100_tpch_08: 257ms -> 260ms [-5.82%, +7.71%]
✔️100_tpch_09: 343ms -> 337ms [-15.8%, +12.64%]
✔️100_tpch_10: 208ms -> 208ms [-5.73%, +5.27%]
✔️100_tpch_11: 81.7ms -> 84.7ms [-4.21%, +11.33%]
✔️100_tpch_12: 170ms -> 177ms [-5.09%, +13.27%]
✔️100_tpch_13: 301ms -> 295ms [-8.79%, +4.75%]
✔️100_tpch_14: 117ms -> 119ms [-11.6%, +14.32%]
✔️100_tpch_15: 204ms -> 206ms [-18.03%, +19.56%]
✔️100_tpch_16: 117ms -> 119ms [-8.67%, +11.46%]
✔️100_tpch_17: 175ms -> 177ms [-16.06%, +18.55%]
✔️100_tpch_18: 205ms -> 216ms [-7.19%, +17.45%]
✔️100_tpch_19: 267ms -> 265ms [-10.6%, +8.76%]
✔️100_tpch_20: 166ms -> 168ms [-8.92%, +11.86%]
✔️100_tpch_21: 1.36s -> 1.34s [-11.14%, +8.7%]
✔️100_tpch_22: 157ms -> 156ms [-16.42%, +15.37%]

Further explanation regarding interpretation and methodology can be found in the documentation.

krlmlr

Keeping open for discussion.

krlmlr · 2025-02-07T04:25:04Z

vignettes/prudence.Rmd

@@ -30,22 +30,19 @@ knitr::opts_chunk$set(
 Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 0)
 ```

-Unlike traditional data frames, duckplyr defers computation until absolutely necessary, allowing DuckDB to optimize execution.


Yes, but is this sentence harmful here?

krlmlr · 2025-02-07T04:29:25Z

vignettes/prudence.Rmd


 - **lazy materialization**: Unlike traditional data frames, duckplyr defers computation until absolutely necessary, i.e. lazily, allowing DuckDB to optimize execution.
 - **prudence**: Automatic materialization is controllable, as automatic materialization of large data might otherwise inadvertently lead to memory problems.

 The term "prudence" is introduced here to set a clear distinction from the concept of "laziness", and because "control of automatic materialization" is a mouthful.

-## Eager and lazy computation
+## DuckDB optimization: lazy evaluation


"lazy evaluation" is easy to confuse with how R handles arguments.

oops, this should be "lazy computation", sorry.

krlmlr · 2025-02-07T04:31:52Z

vignettes/prudence.Rmd

-Thrifty duckplyr frames are a compromise between lavish and frugal, discussed further below.
+Thrifty duckplyr frames are a compromise between lavish and frugal, discussed below.
+
+### Thrift


I want to give the readers time to digest "lavish" and "frugal".

krlmlr · 2025-02-07T04:33:54Z

vignettes/prudence.Rmd

@@ -198,7 +233,7 @@ flights_frugal[[1]]
 ```


-### Enforcing DuckDB operation
+### Side effect: Enforcing DuckDB operation


Good point, moving to "fallback".

Later.

maelle requested a review from krlmlr February 6, 2025 09:55

maelle commented Feb 6, 2025

View reviewed changes

vignettes/prudence.Rmd Show resolved Hide resolved

maelle added the documentation label Feb 6, 2025

krlmlr force-pushed the prudence-rmd branch from 00b4546 to 3385047 Compare February 7, 2025 02:43

krlmlr force-pushed the prudence-rmd branch from 3385047 to 9fba250 Compare February 7, 2025 03:19

krlmlr marked this pull request as draft February 7, 2025 03:23

krlmlr mentioned this pull request Feb 7, 2025

docs: some vignette tweaks #595

Merged

krlmlr force-pushed the prudence-rmd branch 2 times, most recently from f404f17 to 3f29cba Compare February 7, 2025 04:11

krlmlr reviewed Feb 7, 2025

View reviewed changes

docs: further vignette tweaks

08be013

krlmlr force-pushed the prudence-rmd branch from 3f29cba to 08be013 Compare February 7, 2025 04:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: improve documentation of `prudence` argument #589

docs: improve documentation of `prudence` argument #589

maelle commented Feb 6, 2025

maelle Feb 6, 2025

krlmlr Feb 7, 2025

maelle Feb 7, 2025

maelle Feb 6, 2025

krlmlr Feb 7, 2025

maelle Feb 7, 2025

maelle Feb 6, 2025

krlmlr Feb 7, 2025 •

edited

Loading

maelle Feb 7, 2025

github-actions bot commented Feb 6, 2025

krlmlr commented Feb 7, 2025

krlmlr commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

krlmlr left a comment

krlmlr Feb 7, 2025

krlmlr Feb 7, 2025

maelle Feb 7, 2025

krlmlr Feb 7, 2025

krlmlr Feb 7, 2025 •

edited

Loading

docs: improve documentation of prudence argument #589

Are you sure you want to change the base?

docs: improve documentation of prudence argument #589

Conversation

maelle commented Feb 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krlmlr Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 6, 2025

krlmlr commented Feb 7, 2025

krlmlr commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

krlmlr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krlmlr Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

docs: improve documentation of `prudence` argument #589

docs: improve documentation of `prudence` argument #589

krlmlr Feb 7, 2025 •

edited

Loading

krlmlr Feb 7, 2025 •

edited

Loading