-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[specs] Using 10 schedules for random exploration can be insufficient for correctness checking/fuzzing #10
Comments
The number We also noticed that there are some invariants from the CRAQ paper that are not checked. |
I see. This makes more sense. But still, I think we should either have a Timer primitive (given how frequently it is used to model faults) or noting about the proper ways to handle timeouts. |
The P model provides a clear view of the CRAQ variant implemented in 3FS for the dev team. Besides being a formal specs to verify the correctness of the system, it functions more like the pseudocode for the chunk storage system. During development, we follow an iterative process: after making small changes, we run all tests for 10 schedules to catch easy bugs. If everything looks good, start multiple P checkers in parallel, each runs 1000 schedules. At the end of the day, we submit jobs to start multiple P checkers, each runs 10000 schedules, and check the results next morning.
Yes, using a global clock is more efficient in this case. Actually we have a similar implementation in a previous version: the heartbeat check loop in
Can you be more specific? To this day, we still cannot persuade ourselves that we have fully grasped the intricacies of CRAQ. A working implementation needs to fill in lots of missing pieces. |
This is very exciting and thanks for sharing! I will get back you on the spec with details in a couple of days. |
As shown in the referred script
3FS/specs/RunTests.ps1
Line 6 in c450ee0
Testing with PChecker only takes 10 schedules. This can be insufficient. By default, PChecker (the fuzzer of P model) uses random exploration. What it does is basically configuring the model instances with non-deterministic inputs and exploring different orders of
event
interleaving. Therefore, 10 schedules may not expose enough behaviors of the P model hence low confidence for correctness. We would suggest using more schedules, e.g., 10k.We assume the reason using 10 instead of more is that it usually takes too long for PChecker to finish with the checking models, especially DataStorage. This usually is due to instantiating multiple
Timer
s in the P model. Timers can drastically slow down PChecker since the way it issueseTimeOut
is through a non-deterministic check as shown below3FS/specs/Timer/PSrc/Timer.p
Lines 15 to 34 in c450ee0
If the non-deterministic check failed, it cycles back to the same state. This creates a lot of event redundancies (especially when there are multiple timers) during random exploration by the
eDelayTimeOut
events. This can easily exhaust the maximum number of steps. A potential workaround is:instead of assigning each component a timer to inform about the timeout, use a "ghost" view server to have a central control of timeout. An example is here: https://github.com/AD1024/P-Tutorial-Raft/blob/44d2e349a804109d1c1dcc2658dcf89d441d8880/PSrc/View.p#L128-L184
The idea is that when an
eTimeOut
occurs, the view server randomly chooses some components and inform them about the timeout event. This way, PChecker will not be overwhelmed by theeDelayTimeOut
events.Hope this might help with the P modeling.
The text was updated successfully, but these errors were encountered: