Parallel API #24

Nugine · 2022-11-27T02:14:22Z

Improve throughput by using multi-threading (rayon).

In most cases, the simd functions are fast enough that we may not benefit from multi-threading. And some functions are not divisible.

Crates with parallel api:

https://github.com/uhmarcel/rbase64

Nugine · 2022-11-27T05:00:40Z

Parallel threshold

$$ \frac{n}{pv}+c<\frac{n}{v} $$

$$ n > \frac{p}{p-1}cv $$

$n$: input size (B)
$p$: num threads
$v$: throughput (B/ns)
$c$: threading overhead (ns)

Parallel time rate (smaller is better)

$$ \frac{n/(pv)+c}{n/v} < 1 $$

$$ \frac{1}{p}+\frac{cv}{n} < 1 $$

When handling large input, the throughput may be limited by cache miss and page faults. In other words, the throughput hits memory bound. So the theoretical parallel time rate is smaller than the practial one.

Nugine · 2022-11-29T07:53:36Z

just bench static-experimental --bench base64 --plotting-backend disabled -- 'base64-encode/base64-simd'

base64-encode (GiB/s)

	16	32	64	256	1024	4096	65536	262144	524288	1048576
base64-simd/auto	1.827	1.977	3.503	8.008	11.826	12.980	13.494	12.852	12.751	12.723
base64-simd/parallel	1.174	0.004	0.008	0.029	0.114	0.433	5.623	16.196	24.001	31.267

pickfire · 2024-06-27T11:07:18Z

Would it be useful to have option to parse multiple items, which the cache and instruction level parallelism can probably help here.

Use case will be something like being able to use it in polars to read a column of uuid.

Nugine · 2024-06-29T09:22:40Z

Would it be useful to have option to parse multiple items, which the cache and instruction level parallelism can probably help here.

Use case will be something like being able to use it in polars to read a column of uuid.

Could you explain this use case in more detail? How can we accelerate it by multithreading?

pickfire · 2024-07-25T02:35:52Z

Not really multi-threading but aligning on cache lines and being able to process multiple items at once maybe can make it faster? When we need to process batches of it, maybe there are better way to process items in bulk compared to processing each item one by one?

Nugine · 2024-07-25T04:18:10Z

Not really multi-threading but aligning on cache lines and being able to process multiple items at once maybe can make it faster? When we need to process batches of it, maybe there are better way to process items in bulk compared to processing each item one by one?

Sounds interesting. We can discuss it in another issue #45

Nugine added needs design performance experimental labels Nov 27, 2022

Nugine mentioned this issue Feb 5, 2023

Todo List #31

Open

15 tasks

Nugine mentioned this issue Jul 25, 2024

UUID parsing in batches #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel API #24

Parallel API #24

Nugine commented Nov 27, 2022 •

edited

Loading

Nugine commented Nov 27, 2022 •

edited

Loading

Nugine commented Nov 29, 2022

pickfire commented Jun 27, 2024

Nugine commented Jun 29, 2024

pickfire commented Jul 25, 2024 •

edited

Loading

Nugine commented Jul 25, 2024

Parallel API #24

Parallel API #24

Comments

Nugine commented Nov 27, 2022 • edited Loading

Nugine commented Nov 27, 2022 • edited Loading

Nugine commented Nov 29, 2022

base64-encode (GiB/s)

pickfire commented Jun 27, 2024

Nugine commented Jun 29, 2024

pickfire commented Jul 25, 2024 • edited Loading

Nugine commented Jul 25, 2024

Nugine commented Nov 27, 2022 •

edited

Loading

Nugine commented Nov 27, 2022 •

edited

Loading

pickfire commented Jul 25, 2024 •

edited

Loading