[specs] Using 10 schedules for random exploration can be insufficient for correctness checking/fuzzing #10

AD1024 · 2025-02-28T06:57:02Z

As shown in the referred script

[Int] $NumIters = 10,

Testing with PChecker only takes 10 schedules. This can be insufficient. By default, PChecker (the fuzzer of P model) uses random exploration. What it does is basically configuring the model instances with non-deterministic inputs and exploring different orders of event interleaving. Therefore, 10 schedules may not expose enough behaviors of the P model hence low confidence for correctness. We would suggest using more schedules, e.g., 10k.

We assume the reason using 10 instead of more is that it usually takes too long for PChecker to finish with the checking models, especially DataStorage. This usually is due to instantiating multiple Timers in the P model. Timers can drastically slow down PChecker since the way it issues eTimeOut is through a non-deterministic check as shown below

3FS/specs/Timer/PSrc/Timer.p

Lines 15 to 34 in c450ee0

    
           state WaitForTimerRequests { 
        
             on eStartTimer goto TimerStarted; 
        
             ignore eCancelTimer, eDelayedTimeOut; 
        
           } 
        
           state TimerStarted { 
        
             defer eStartTimer; 
        
             entry { 
        
               if($) 
        
               { 
        
                 send client, eTimeOut; 
        
                 goto WaitForTimerRequests; 
        
               } 
        
               else 
        
               { 
        
                 send this, eDelayedTimeOut; 
        
               } 
        
             } 
        
             on eDelayedTimeOut goto TimerStarted; 
        
             on eCancelTimer goto WaitForTimerRequests;

If the non-deterministic check failed, it cycles back to the same state. This creates a lot of event redundancies (especially when there are multiple timers) during random exploration by the eDelayTimeOut events. This can easily exhaust the maximum number of steps. A potential workaround is:
instead of assigning each component a timer to inform about the timeout, use a "ghost" view server to have a central control of timeout. An example is here: https://github.com/AD1024/P-Tutorial-Raft/blob/44d2e349a804109d1c1dcc2658dcf89d441d8880/PSrc/View.p#L128-L184
The idea is that when an eTimeOut occurs, the view server randomly chooses some components and inform them about the timeout event. This way, PChecker will not be overwhelmed by the eDelayTimeOut events.

Hope this might help with the P modeling.

The text was updated successfully, but these errors were encountered:

ankushdesai · 2025-02-28T18:18:06Z

The number 10 was added here just to make the test scripts run faster. Internally, the team ran the models for more schedules.

We also noticed that there are some invariants from the CRAQ paper that are not checked.

AD1024 · 2025-02-28T19:15:07Z

The number 10 was added here just to make the test scripts run faster. Internally, the team ran the models for more schedules.

We also noticed that there are some invariants from the CRAQ paper that are not checked.

I see. This makes more sense. But still, I think we should either have a Timer primitive (given how frequently it is used to model faults) or noting about the proper ways to handle timeouts.

wlxiong · 2025-03-01T00:32:46Z

The P model provides a clear view of the CRAQ variant implemented in 3FS for the dev team. Besides being a formal specs to verify the correctness of the system, it functions more like the pseudocode for the chunk storage system.

During development, we follow an iterative process: after making small changes, we run all tests for 10 schedules to catch easy bugs. If everything looks good, start multiple P checkers in parallel, each runs 1000 schedules. At the end of the day, we submit jobs to start multiple P checkers, each runs 10000 schedules, and check the results next morning.

A potential workaround is: instead of assigning each component a timer to inform about the timeout, use a "ghost" view server to have a central control of timeout.

Yes, using a global clock is more efficient in this case. Actually we have a similar implementation in a previous version: the heartbeat check loop in MgmtService serves as the global clock. But implementing it in an independent actor seems to be a better approach.

We also noticed that there are some invariants from the CRAQ paper that are not checked.

Can you be more specific? To this day, we still cannot persuade ourselves that we have fully grasped the intricacies of CRAQ. A working implementation needs to fill in lots of missing pieces.

ankushdesai · 2025-03-01T00:50:58Z

This is very exciting and thanks for sharing! I will get back you on the spec with details in a couple of days.

wlxiong self-assigned this Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[specs] Using 10 schedules for random exploration can be insufficient for correctness checking/fuzzing #10

[specs] Using 10 schedules for random exploration can be insufficient for correctness checking/fuzzing #10

AD1024 commented Feb 28, 2025

ankushdesai commented Feb 28, 2025

AD1024 commented Feb 28, 2025

wlxiong commented Mar 1, 2025

ankushdesai commented Mar 1, 2025

[specs] Using 10 schedules for random exploration can be insufficient for correctness checking/fuzzing #10

[specs] Using 10 schedules for random exploration can be insufficient for correctness checking/fuzzing #10

Comments

AD1024 commented Feb 28, 2025

ankushdesai commented Feb 28, 2025

AD1024 commented Feb 28, 2025

wlxiong commented Mar 1, 2025

ankushdesai commented Mar 1, 2025