Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pipelined execution in CheckpointExecutor #21538

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mystenmark
Copy link
Contributor

This allows us to have parallelism throughout the entire CheckpointExecution process, rather than just during the transaction-execution phase.

Additionally, by imposing stricter-than-necessary ordering constraints on the execution phase (i.e. that we enqueue all transactions from checkpoint N before enqueueing those from N+1) we achieve lower latency.

Throughput should also be improved here, by running the batch-building and batch-writing steps in different stages of the pipeline, which allows us to start building the batch for seq N+1 while we are still committing the batch for seq N.

@mystenmark mystenmark requested review from aschran and andll March 19, 2025 16:35
Copy link

vercel bot commented Mar 19, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 21, 2025 6:38pm
2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Mar 21, 2025 6:38pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Mar 21, 2025 6:38pm

@mystenmark mystenmark temporarily deployed to sui-typescript-aws-kms-test-env March 19, 2025 16:35 — with GitHub Actions Inactive
/// A collection of watches for each stage. These are the synchronization points
/// for the pipeline.
pub(super) struct PipelineStages {
stages: [SequenceWatch; PipelineStage::End as usize],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly not worth mentioning, but since you're already using strum, would it make sense to use strum::EnumCount here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, but then it would be COUNT - 1 which seems possibly more confusing?

@@ -25,6 +26,9 @@ pub struct CheckpointExecutorMetrics {
// TODO: delete once users are migrated to non-Mysten histogram.
pub last_executed_checkpoint_age_ms: MystenHistogram,
pub checkpoint_executor_validator_path: IntGauge,

pub stage_wait_duration_ns: IntCounterVec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense for these to be histograms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that would useful? We can always add them later

@@ -917,7 +917,7 @@ impl ExpensiveSafetyCheckConfig {
}

fn default_checkpoint_execution_max_concurrency() -> usize {
40
4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to lower this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the new system, you cannot have more concurrency than there are stages, since only one thread can be in each stage at a time. So the max possible value would be 8 or so. But in practice almost all the time is consumed by execution, building db batch, and committing batch. So 3 + 1 more core to handle all the other small stages seems sensible.

if we eventually need more throughput we will probably have to shard the object writes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm does that mean there can be at most 8 transaction executing at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - this is checkpoint parallelism not tx parallelism. the ExecuteTransaction stage schedules all transactions in the checkpoint concurrently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants