Scheduler A/B-testing and metrics #9857
Unanswered
rjan90
asked this question in
Storage Provider
Replies: 1 comment
-
We're busy building containers for the mentioned branches and we'll run the test on Calibnet. Calibnet was recently resetted so we're setting up new daemons, wallets, SPs etc. First SP t03782 is already created. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey everyone 👋
One of the areas in the
Architectural/System-level problems for enterprise-level Storage Providers
discussion has been around the Sealing tasks scheduling inefficiencies. This problem area has gathered a lot of good discussion and feedback. We are now at the point where we need to have actual metrics and numbers to move forward. If we can´t measure it, we can´t improve it!We have added 4 new duration-distribution metrics about the scheduler to the prometheus endpoints, and looking to do some a A/B-testing of the scheduler. Ideally this A/B-test should be run with as many workers as possible to get the best metrics possible.
Branches:
We have created two branches that we want to gather metrics from. Both branches are based off the final
v1.19.0
release with some small nuances.Branch
sched-metrics/test-a
(link) includes the finalv1.19.0
and PR feat: sched: Add metrics around sched cycle #9738 that adds the scheduler metrics.Branch
sched-metrics/test-b
(link) includes the finalv1.19.0
and PRs feat: sched: Add metrics around sched cycle #9738 and feat: sched: Cache worker calls #9737, that adds the metrics endpoints and also caches scheduler worker calls.Test-A:
Checkout the test-a branch (
git checkout sched-metrics/test-a
), and upgrade yourlotus-miner
following normal upgrade procedures.Start up the
lotus-miner
and yourlotus-workers
and push as many sectors and deals you can through your lotus-workers. Run on this branch for about a day or two while trying to load the scheduler. After that go tohttp://127.0.0.1:2345/debug/metrics
, or whatever you have set your[minerapi]/debug/metrics
to.Save the outputs to an txt-file and upload them here: https://drive.google.com/drive/u/0/folders/1cdTZM5TQwOpEWESfu0NvmcpjTD22TRU8.
Name the file <Your-Name-sched-metrics-A.txt>.
Test-B:
Checkout the test-b branch (
git checkout sched-metrics/test-b
), and upgrade yourlotus-miner
following normal upgrade procedures.Start up the
lotus-miner
and yourlotus-workers
and push as many sectors and deals you can through your lotus-workers. Run on this branch for about a day or two while trying to load the scheduler. After that go tohttp://127.0.0.1:2345/debug/metrics
, or whatever you have set your[minerapi]/debug/metrics
to.Save the outputs to an txt-file and upload them here: https://drive.google.com/drive/u/0/folders/1cdTZM5TQwOpEWESfu0NvmcpjTD22TRU8.
Name the file <Your-Name-sched-metrics-B.txt>.
A big 🙌 to everyone that participates
I want to be clear that we do need these metrics for moving forward with a potential re-architecture of the scheduler, so we expect that larger storage provider help us run these tests. We are looking forward to getting a lot of metrics from all of you, so we can start to analyze them. 😄 Questions are welcomed in this discussion. We will be monitoring and check in regularly.
Beta Was this translation helpful? Give feedback.
All reactions