-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pipelined execution in CheckpointExecutor #21538
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
2 Skipped Deployments
|
/// A collection of watches for each stage. These are the synchronization points | ||
/// for the pipeline. | ||
pub(super) struct PipelineStages { | ||
stages: [SequenceWatch; PipelineStage::End as usize], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly not worth mentioning, but since you're already using strum, would it make sense to use strum::EnumCount here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, but then it would be COUNT - 1
which seems possibly more confusing?
@@ -25,6 +26,9 @@ pub struct CheckpointExecutorMetrics { | |||
// TODO: delete once users are migrated to non-Mysten histogram. | |||
pub last_executed_checkpoint_age_ms: MystenHistogram, | |||
pub checkpoint_executor_validator_path: IntGauge, | |||
|
|||
pub stage_wait_duration_ns: IntCounterVec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense for these to be histograms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that would useful? We can always add them later
@@ -917,7 +917,7 @@ impl ExpensiveSafetyCheckConfig { | |||
} | |||
|
|||
fn default_checkpoint_execution_max_concurrency() -> usize { | |||
40 | |||
4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to lower this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the new system, you cannot have more concurrency than there are stages, since only one thread can be in each stage at a time. So the max possible value would be 8 or so. But in practice almost all the time is consumed by execution, building db batch, and committing batch. So 3 + 1 more core to handle all the other small stages seems sensible.
if we eventually need more throughput we will probably have to shard the object writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm does that mean there can be at most 8 transaction executing at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no - this is checkpoint parallelism not tx parallelism. the ExecuteTransaction stage schedules all transactions in the checkpoint concurrently.
58bc6f9
to
84b393c
Compare
This allows us to have parallelism throughout the entire CheckpointExecution process, rather than just during the transaction-execution phase.
Additionally, by imposing stricter-than-necessary ordering constraints on the execution phase (i.e. that we enqueue all transactions from checkpoint N before enqueueing those from N+1) we achieve lower latency.
Throughput should also be improved here, by running the batch-building and batch-writing steps in different stages of the pipeline, which allows us to start building the batch for seq N+1 while we are still committing the batch for seq N.