Scheduler A/B-testing and metrics #9857

rjan90 · 2022-12-13T08:24:44Z

rjan90
Dec 13, 2022
Maintainer

Hey everyone 👋

One of the areas in the Architectural/System-level problems for enterprise-level Storage Providers discussion has been around the Sealing tasks scheduling inefficiencies. This problem area has gathered a lot of good discussion and feedback. We are now at the point where we need to have actual metrics and numbers to move forward. If we can´t measure it, we can´t improve it!

We have added 4 new duration-distribution metrics about the scheduler to the prometheus endpoints, and looking to do some a A/B-testing of the scheduler. Ideally this A/B-test should be run with as many workers as possible to get the best metrics possible.

Branches:

We have created two branches that we want to gather metrics from. Both branches are based off the final v1.19.0 release with some small nuances.

Branch sched-metrics/test-a(link) includes the final v1.19.0 and PR feat: sched: Add metrics around sched cycle #9738 that adds the scheduler metrics.
Branch sched-metrics/test-b(link) includes the final v1.19.0 and PRs feat: sched: Add metrics around sched cycle #9738 and feat: sched: Cache worker calls #9737, that adds the metrics endpoints and also caches scheduler worker calls.

Test-A:

Checkout the test-a branch (git checkout sched-metrics/test-a), and upgrade your lotus-miner following normal upgrade procedures.

Start up the lotus-miner and your lotus-workers and push as many sectors and deals you can through your lotus-workers. Run on this branch for about a day or two while trying to load the scheduler. After that go to http://127.0.0.1:2345/debug/metrics, or whatever you have set your [minerapi]/debug/metrics to.

Save the outputs to an txt-file and upload them here: https://drive.google.com/drive/u/0/folders/1cdTZM5TQwOpEWESfu0NvmcpjTD22TRU8.
Name the file <Your-Name-sched-metrics-A.txt>.

Test-B:

Checkout the test-b branch (git checkout sched-metrics/test-b), and upgrade your lotus-miner following normal upgrade procedures.

Start up the lotus-miner and your lotus-workers and push as many sectors and deals you can through your lotus-workers. Run on this branch for about a day or two while trying to load the scheduler. After that go to http://127.0.0.1:2345/debug/metrics, or whatever you have set your [minerapi]/debug/metrics to.

Save the outputs to an txt-file and upload them here: https://drive.google.com/drive/u/0/folders/1cdTZM5TQwOpEWESfu0NvmcpjTD22TRU8.
Name the file <Your-Name-sched-metrics-B.txt>.

A big 🙌 to everyone that participates

I want to be clear that we do need these metrics for moving forward with a potential re-architecture of the scheduler, so we expect that larger storage provider help us run these tests. We are looking forward to getting a lot of metrics from all of you, so we can start to analyze them. 😄 Questions are welcomed in this discussion. We will be monitoring and check in regularly.

SealStorage-Jacques · 2023-01-16T14:39:19Z

SealStorage-Jacques
Jan 16, 2023

We're busy building containers for the mentioned branches and we'll run the test on Calibnet. Calibnet was recently resetted so we're setting up new daemons, wallets, SPs etc. First SP t03782 is already created.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler A/B-testing and metrics #9857

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Scheduler A/B-testing and metrics #9857

rjan90 Dec 13, 2022 Maintainer

Hey everyone 👋

Branches:

Test-A:

Test-B:

A big 🙌 to everyone that participates

Replies: 1 comment

SealStorage-Jacques Jan 16, 2023

rjan90
Dec 13, 2022
Maintainer

SealStorage-Jacques
Jan 16, 2023