Use pipelined execution in CheckpointExecutor #21538

mystenmark · 2025-03-19T16:35:26Z

This allows us to have parallelism throughout the entire CheckpointExecution process, rather than just during the transaction-execution phase.

Additionally, by imposing stricter-than-necessary ordering constraints on the execution phase (i.e. that we enqueue all transactions from checkpoint N before enqueueing those from N+1) we achieve lower latency.

Throughput should also be improved here, by running the batch-building and batch-writing steps in different stages of the pipeline, which allows us to start building the batch for seq N+1 while we are still committing the batch for seq N.

vercel · 2025-03-19T16:35:32Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 21, 2025 6:38pm

2 Skipped Deployments

Name	Status	Preview	Comments	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview		Mar 21, 2025 6:38pm
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview		Mar 21, 2025 6:38pm

crates/sui-core/src/authority/authority_per_epoch_store.rs

crates/sui-core/src/checkpoints/checkpoint_executor/utils.rs

aschran · 2025-03-19T17:03:28Z

crates/sui-core/src/checkpoints/checkpoint_executor/utils.rs

+/// A collection of watches for each stage. These are the synchronization points
+/// for the pipeline.
+pub(super) struct PipelineStages {
+    stages: [SequenceWatch; PipelineStage::End as usize],


possibly not worth mentioning, but since you're already using strum, would it make sense to use strum::EnumCount here?

hmm, but then it would be COUNT - 1 which seems possibly more confusing?

aschran · 2025-03-19T17:05:14Z

crates/sui-core/src/checkpoints/checkpoint_executor/metrics.rs

@@ -25,6 +26,9 @@ pub struct CheckpointExecutorMetrics {
    // TODO: delete once users are migrated to non-Mysten histogram.
    pub last_executed_checkpoint_age_ms: MystenHistogram,
    pub checkpoint_executor_validator_path: IntGauge,
+
+    pub stage_wait_duration_ns: IntCounterVec,


Would it make more sense for these to be histograms?

I'm not sure that would useful? We can always add them later

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs

crates/sui-core/src/checkpoints/checkpoint_executor/utils.rs

lxfind · 2025-03-20T17:25:17Z

crates/sui-config/src/node.rs

@@ -917,7 +917,7 @@ impl ExpensiveSafetyCheckConfig {
 }

 fn default_checkpoint_execution_max_concurrency() -> usize {
-    40
+    4


What's the reason to lower this?

in the new system, you cannot have more concurrency than there are stages, since only one thread can be in each stage at a time. So the max possible value would be 8 or so. But in practice almost all the time is consumed by execution, building db batch, and committing batch. So 3 + 1 more core to handle all the other small stages seems sensible.

if we eventually need more throughput we will probably have to shard the object writes.

hmm does that mean there can be at most 8 transaction executing at the same time?

no - this is checkpoint parallelism not tx parallelism. the ExecuteTransaction stage schedules all transactions in the checkpoint concurrently.

crates/sui-core/src/checkpoints/checkpoint_executor/mod.rs

crates/sui-core/src/checkpoints/checkpoint_executor/utils.rs

mystenmark requested review from aschran and andll March 19, 2025 16:35

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 19, 2025 16:35 — with GitHub Actions Inactive

aschran reviewed Mar 19, 2025

View reviewed changes

andll reviewed Mar 19, 2025

View reviewed changes

crates/sui-core/src/checkpoints/checkpoint_executor/utils.rs Outdated Show resolved Hide resolved

lxfind reviewed Mar 20, 2025

View reviewed changes

vercel bot deployed to Preview – sui-docs March 20, 2025 19:57 View deployment

mystenmark added 5 commits March 20, 2025 14:29

PipelineStages

2ebc30d

Use pipelined execution in CheckpointExecutor

09cffbb

Tracing/monitoring improvements

a6b2901

PR Comments

1846460

PR comments

84b393c

mystenmark force-pushed the mlogan-ckpt-opt-pr-1 branch from 58bc6f9 to 84b393c Compare March 20, 2025 21:35

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 20, 2025 21:36 — with GitHub Actions Inactive

mystenmark requested review from lxfind and aschran March 20, 2025 21:37

vercel bot deployed to Preview – sui-docs March 20, 2025 21:37 View deployment

fix lints

605cf14

mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 21, 2025 18:36 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs March 21, 2025 18:38 View deployment

mystenmark requested a review from andll March 21, 2025 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pipelined execution in CheckpointExecutor #21538

Use pipelined execution in CheckpointExecutor #21538

mystenmark commented Mar 19, 2025

vercel bot commented Mar 19, 2025 •

edited

Loading

aschran Mar 19, 2025

mystenmark Mar 20, 2025

aschran Mar 19, 2025

mystenmark Mar 20, 2025

lxfind Mar 20, 2025

mystenmark Mar 20, 2025

lxfind Mar 20, 2025

mystenmark Mar 21, 2025

Use pipelined execution in CheckpointExecutor #21538

Are you sure you want to change the base?

Use pipelined execution in CheckpointExecutor #21538

Conversation

mystenmark commented Mar 19, 2025

vercel bot commented Mar 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vercel bot commented Mar 19, 2025 •

edited

Loading