Implements concurrent `Smt::compute_mutations` #365

krushimir · 2025-01-15T22:22:46Z

This PR introduces a concurrent implementation of Smt::compute_mutations, leveraging an approach similar to the existing parallel construction logic.

Benchmark results were collected on a 64-core (128-thread) AMD EPYC 7662 processor, with Rayon’s thread pool explicitly limited to the specified thread counts.

For context, construction benchmarks are also included for performance comparison.

1. Construction Benchmark

10k key-value pairs

Threads	Parallel Time (s)	Sequential Time (s)	Speedup
16	0.5	5.7	11.11x
32	0.4	5.7	15.22x
64	0.3	5.7	17.35x
128	0.4	5.7	16.90x

Optimal performance was achieved with 64 threads.
Diminishing returns were observed with 128 threads

2. Batched Insertion Benchmark

10k key-value pairs

Threads	Parallel Time (ms)	Sequential Time (ms)	Speedup	Avg Insert Time (μs)
16	517.0	6308.7	12.20x	52
32	395.8	6334.5	16.00x	40
64	333.0	6321.6	18.98x	33
128	383.7	6300.7	16.42x	38

64 threads offered the best performance, reducing average insertion time to 33 μs.
Scaling beyond 64 threads led to slight performance degradation.

3. Batched Update Benchmark

10k key-value pairs

Threads	Parallel Time (ms)	Sequential Time (ms)	Speedup	Avg Update Time (μs)
16	482.7	6369.8	13.20x	48
32	357.7	6351.5	17.76x	36
64	304.7	6378.5	20.93x	30
128	273.5	6418.8	23.47x	27

Batched updates scaled better with increased threads.
128 threads achieved the fastest update speed, reducing average time to 27 μs.

PhilippGackstatter

Looks great to me! I think the logic itself looks good. My comments are mostly about naming, docs and deduplication. I might have to take another look anyway, since I first had to understand how the Smt is implemented in sequential code 😅, so I'll just comment for now.

In general, I think adding comments to code parts that are not easy to understand would improve readability and understandability.

Regarding the approach, please correct me if I have misunderstandings, but my understanding of the approach is the following.

Assuming a tree of depth 64 with subtrees of depth 8 and mutations of just two (for example's sake) leaves at indices 0 and 65536, compute_mutations would do this, on a high-level and making some simple assumptions about how rayon assigns threads:

Compute subtrees that were modified. This happens in sorted_pairs_to_mutated_leaves. This would yield two subtrees, covering the column ranges 0..256 and 65536..65792.
Then in build_subtree_mutations, the subtrees are updated in parallel.
- 1st iteration:
  - Thread 0: Compute updates for leaves with indices 0..256 at depth 64. Then updates for leaves at depth 63 within this subtree, and so on, until it eventually results in new root at depth 56, column 0.
  - Thread 1: Compute updates for leaves with indices 65536..65792 at depth 64. Then updates for leaves at depth 63 within this subtree, and so on, until it eventually results in new root at depth 56, column 256 (= 65536 >> 8).
- 2nd iteration:
  - Thread 0: Compute updates for leaves with indices 0..256 at depth 56 (only root 0 has changed). Eventually this results in a new root at depth 48, column 0.
  - Thread 1: Compute updates for leaves with indices 256..512 at depth 56 (only root 256 has changed). Eventually this results in a new root at depth 48, column 1.
- 3rd iteration:
  - Thread 0: Compute updates for leaves with indices 0..256 at depth 48 (only root 0 has changed). Eventually this results in a new root at depth 40, column 0.
- More iterations like the 3rd until the root at depth 0 has been reached.

Is this accurate? Would it make sense to add something like this as a doc comment to compute_mutations_subtree (with corrections if it's inaccurate)?

src/merkle/smt/mod.rs

src/merkle/smt/tests.rs

src/merkle/smt/mod.rs

krushimir · 2025-01-17T21:37:36Z

10M entries tree.

batch insertions (10k inserts):
without smt_hashmaps: 383.3 ms (~38 μs per insert)
with smt_hashmaps: 281.9 ms (~28 μs per insert)
~26% faster
concurrent vs. sequential: 17.7x faster
concurrent with smt_hashmaps vs. sequential: 24.1x faster

batch updates (10k updates):
without smt_hashmaps: 287.9 ms (~29 μs per update)
without smt_hashmaps: 265.5 ms (~27 μs per update)
~8% faster
concurrent vs. sequential: 23.6x faster
concurrent with smt_hashmaps vs. sequential: 25.6x faster

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

src/merkle/smt/full/mod.rs

PhilippGackstatter · 2025-01-22T09:46:41Z

Hey @krushimir, quick question: Is this still Work-In-Progress or can it be marked as ready for review?

krushimir · 2025-01-22T11:11:22Z

Hi @PhilippGackstatter, I'll push some more changes today and then I'll mark it ready.

src/main.rs

src/merkle/smt/simple/mod.rs

src/merkle/smt/mod.rs

PhilippGackstatter

Looks good to me!

src/main.rs

src/merkle/smt/tests.rs

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

src/merkle/smt/full/mod.rs

src/merkle/smt/mod.rs

bobbinth

Looks good! Thank you! I left a couple of comments inline. The main one is about code organization - i.e., potentially moving the parallel mutation functions to the Smt struct.

src/main.rs

src/merkle/smt/mod.rs

# Conflicts: # src/merkle/mod.rs

src/merkle/smt/full/mod.rs

sonarqubecloud · 2025-02-06T16:06:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

bobbinth

Looks good! Thank you!

bobbinth · 2025-02-07T01:48:00Z

On my machine (M1 Pro), I see the following results:

Single-threaded execution, `smt_hashmaps` enabled

Running a construction benchmark:
Constructed an SMT with 1000000 key-value pairs in 351.2 seconds
Number of leaf nodes: 1000000

Running an insertion benchmark:
The average insertion time measured by 1000 inserts into an SMT with 1000000 leaves is 694 μs

Running a batched insertion benchmark:
The average insert-batch computation time measured by a 1000-batch into an SMT with 1000000 leaves over 424.6 ms is 425 μs
The average insert-batch application time measured by a 1000-batch into an SMT with 1000000 leaves over 199.3 ms is 199 μs
The average batch insertion time measured by a 1000-batch into an SMT with 1000000 leaves totals to 623.9 ms

Running a batched update benchmark:
The average update-batch computation time measured by a 1000-batch into an SMT with 1000000 leaves over 414.2 ms is 414 μs
The average update-batch application time measured by a 1000-batch into an SMT with 1000000 leaves over 4.6 ms is 5 μs
The average batch update time measured by a 1000-batch into an SMT with 1000000 leaves totals to 418.8 ms

Running a proof generation benchmark:
The average proving time measured by 100 value proofs in an SMT with 1000000 leaves in 0 μs

Multi-threaded execution, `smt_hashmaps` enabled

Running a construction benchmark:
Constructed an SMT with 1000000 key-value pairs in 37.2 seconds
Number of leaf nodes: 1000000

Running an insertion benchmark:
The average insertion time measured by 1000 inserts into an SMT with 1000000 leaves is 610 μs

Running a batched insertion benchmark:
The average insert-batch computation time measured by a 1000-batch into an SMT with 1000000 leaves over 50.1 ms is 50 μs
The average insert-batch application time measured by a 1000-batch into an SMT with 1000000 leaves over 36.4 ms is 36 μs
The average batch insertion time measured by a 1000-batch into an SMT with 1000000 leaves totals to 86.5 ms

Running a batched update benchmark:
The average update-batch computation time measured by a 1000-batch into an SMT with 1000000 leaves over 51.1 ms is 51 μs
The average update-batch application time measured by a 1000-batch into an SMT with 1000000 leaves over 5.1 ms is 5 μs
The average batch update time measured by a 1000-batch into an SMT with 1000000 leaves totals to 56.2 ms

Running a proof generation benchmark:
The average proving time measured by 100 value proofs in an SMT with 1000000 leaves in 0 μs

Comparison

Benchmark	Single-thread	Multi-threaded	Improvement
Construction (1M leaves)	351 sec	37 sec	9.5x
Insertion (1K batch)	623 ms	86 ms	7.2x
Updates (1K batch)	419 ms	56 ms	7.5x

bobbinth · 2025-02-07T03:02:53Z

And on M4 max, the results look like so:

Benchmark	Single-thread	Multi-threaded	Improvement
Construction (1M leaves)	195 sec	15 sec	13x
Insertion (1K batch)	212 ms	28 ms	7.6x
Updates (1K batch)	218 ms	24 ms	9.1x

feat: adds concurrent Smt::compute_mutations

c3bbe1c

bobbinth requested a review from PhilippGackstatter January 16, 2025 17:35

PhilippGackstatter reviewed Jan 17, 2025

View reviewed changes

chore: cleanup bench

c447c6f

chore: adds comment

f42d597

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

bobbinth reviewed Jan 22, 2025

View reviewed changes

src/merkle/smt/full/mod.rs Outdated Show resolved Hide resolved

chore: addressing comments

a76506f

krushimir marked this pull request as ready for review January 23, 2025 07:12

krushimir changed the title ~~[WIP] implements concurrent Smt::compute_mutations~~ Implements concurrent Smt::compute_mutations Jan 23, 2025

Mirko-von-Leipzig reviewed Jan 23, 2025

View reviewed changes

src/main.rs Show resolved Hide resolved

src/merkle/smt/simple/mod.rs Outdated Show resolved Hide resolved

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

PhilippGackstatter reviewed Jan 23, 2025

View reviewed changes

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

PhilippGackstatter approved these changes Jan 23, 2025

View reviewed changes

src/main.rs Outdated Show resolved Hide resolved

src/merkle/smt/tests.rs Show resolved Hide resolved

krushimir and others added 2 commits January 23, 2025 17:32

chore: update docs

ec35f28

Co-authored-by: Philipp Gackstatter <PhilippGackstatter@users.noreply.github.com>

chore: linting and addressing comments

e89daa9

krushimir force-pushed the krushimir/subtree_mutations branch from 9242cff to e89daa9 Compare January 23, 2025 17:03

krushimir added 2 commits January 23, 2025 18:17

Merge branch 'next' into krushimir/subtree_mutations

c1bcd6d

docs: SimpleSmt::compute_mutations note

17b03a8

krushimir force-pushed the krushimir/subtree_mutations branch from 11ad605 to 17b03a8 Compare January 27, 2025 20:26

bobbinth mentioned this pull request Jan 28, 2025

Improve performance of Smt::sorted_pairs_to_leaves #348

Closed

polydez reviewed Jan 28, 2025

View reviewed changes

src/merkle/smt/full/mod.rs Outdated Show resolved Hide resolved

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

chore: addressing comments

1eb7769

bobbinth reviewed Jan 30, 2025

View reviewed changes

src/main.rs Outdated Show resolved Hide resolved

src/merkle/smt/mod.rs Outdated Show resolved Hide resolved

krushimir added 4 commits January 30, 2025 13:16

chore: change the benchmark params default values

4f6f431

chore: refactor concurrent implementations

b119df0

Merge branch 'next' into krushimir/subtree_mutations

5257192

# Conflicts: # src/merkle/mod.rs

chore: remove unnecessary note

bdcdd6c

bobbinth reviewed Feb 6, 2025

View reviewed changes

src/merkle/smt/full/mod.rs Outdated Show resolved Hide resolved

bobbinth reviewed Feb 6, 2025

View reviewed changes

src/merkle/smt/full/mod.rs Outdated Show resolved Hide resolved

bobbinth reviewed Feb 6, 2025

View reviewed changes

src/merkle/smt/full/mod.rs Outdated Show resolved Hide resolved

chore: improve refactor effort

1d7cac8

bobbinth reviewed Feb 6, 2025

View reviewed changes

src/merkle/smt/full/mod.rs Outdated Show resolved Hide resolved

chore: addressing comments

9727652

bobbinth approved these changes Feb 7, 2025

View reviewed changes

bobbinth merged commit 1b77fa8 into 0xPolygonMiden:next Feb 7, 2025
15 checks passed

bobbinth mentioned this pull request Feb 19, 2025

docs: add SMT benchmarks #384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements concurrent `Smt::compute_mutations` #365

Implements concurrent `Smt::compute_mutations` #365

krushimir commented Jan 15, 2025

PhilippGackstatter left a comment

krushimir commented Jan 17, 2025

PhilippGackstatter commented Jan 22, 2025

krushimir commented Jan 22, 2025

PhilippGackstatter left a comment

bobbinth left a comment

sonarqubecloud bot commented Feb 6, 2025

bobbinth left a comment

bobbinth commented Feb 7, 2025

bobbinth commented Feb 7, 2025

Implements concurrent Smt::compute_mutations #365

Implements concurrent Smt::compute_mutations #365

Conversation

krushimir commented Jan 15, 2025

1. Construction Benchmark

2. Batched Insertion Benchmark

3. Batched Update Benchmark

PhilippGackstatter left a comment

Choose a reason for hiding this comment

krushimir commented Jan 17, 2025

PhilippGackstatter commented Jan 22, 2025

krushimir commented Jan 22, 2025

PhilippGackstatter left a comment

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Feb 6, 2025

Quality Gate passed

bobbinth left a comment

Choose a reason for hiding this comment

bobbinth commented Feb 7, 2025

Single-threaded execution, smt_hashmaps enabled

Multi-threaded execution, smt_hashmaps enabled

Comparison

bobbinth commented Feb 7, 2025

Implements concurrent `Smt::compute_mutations` #365

Implements concurrent `Smt::compute_mutations` #365

Single-threaded execution, `smt_hashmaps` enabled

Multi-threaded execution, `smt_hashmaps` enabled