fix: if already known beacon payload hasn't state after prune, fix it #47

welkin22 · 2024-01-16T11:36:32Z

Description

After executing the pruning command geth snapshot prune-state --datadir {the data dir of your bsc node} --triesInMemory=32 on the current op-geth, the node may encounter a situation where the block height is stuck and unable to increase when restarted.

When pruning, geth will by default select the block height corresponding to the bottom layer of diffLayer in the snapshot structure as the target block height. Since we have configured triesInMemory=32 , there are a total of 32 layers in diffLayer, and the target block height is the latest block height minus 31 blocks. After pruning, all block heights except for the target block height will have their state data cleared.
When geth is restarted, the code will automatically start rolling back the chain until a block height with state data is found, since the latest block height has lost state data. Therefore, the unsafe block height will roll back 31 block heights. It is worth noting that although the unsafe block height is rolled back, the header, body, receipts and other data are not deleted and still exist in the database.
At this time, op-node will also start, and it will use op-geth to obtain the new unsafe block height, and based on this, it will produce a block to advance the block height header.
At this time, if the node is a sequencer, we will encounter two situations:

The hash values of the new unsafe blocks obtained by the op-node from the 1002nd block to the 1032nd block are different from those in the op-geth database. Therefore, the newPayload interface will reconstruct the blocks without skipping any processing, and the state data corresponding to the blocks in this interval will be rebuilt. The 1033rd block can be inserted normally, and the block height can increase normally. The biggest problem with this situation is that the transactions of the 31 blocks that the user has previously put on the chain are discarded, the hash value of the block height changes, and the transactions included in the block have also changed.
The new unsafe block height hash value obtained by the op-node in block 1002 is the same as the one in the op-geth database, so the newPayload interface skips processing this block height. Since block 1002 belongs to the Canonical chain, it will not trigger the SetCanonical method in the subsequent forkchoiceUpdated interface, and the state data will not be rebuilt. However, the hash value obtained in block 1003 is different, so the newPayload interface will not skip processing this block height. Since block 1002 does not have state data, the subsequent processing flow in block 1003 will not pass the check, causing the entire chain to get stuck at block 1003. This situation can also occur for non-sequence nodes.

Rationale

To solve the above problem, I modify the logic of newPayload code. When a duplicate block height with the same hash value is detected, check if there is corresponding state data for this block height. If the state data is missing, trigger a rebuild. This way, the state data will not be missing for block height 1002 and block height 1003 can be inserted normally. This solution cannot solve situation 1, but situation 1 only occurs with sequencers, so we can avoid this problem by avoiding pruning on the sequencer that produces the block.

I have another solution PR: #46
Both of these solutions can solve our problem, we can discuss choosing one or both.

Example

none

Changes

Notable changes:

logic of newPayload changed

welkin22 · 2024-01-17T09:16:38Z

We chose this solution: #46 So close this PR.

fix: if already known beacon payload hasn't state,fix it

e46668e

github-actions bot requested review from bendanzhentan and bnoieh January 16, 2024 11:36

welkin22 changed the title ~~fix: if already known beacon payload hasn't state,fix it~~ fix: if already known beacon payload hasn't state after prune, fix it Jan 16, 2024

welkin22 mentioned this pull request Jan 16, 2024

fix: prune uses the latest block height as the target by default #46

Merged

welkin22 closed this Jan 17, 2024

sysvm deleted the feature/prune_issue_fix branch July 29, 2024 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: if already known beacon payload hasn't state after prune, fix it #47

fix: if already known beacon payload hasn't state after prune, fix it #47

welkin22 commented Jan 16, 2024

welkin22 commented Jan 17, 2024

fix: if already known beacon payload hasn't state after prune, fix it #47

fix: if already known beacon payload hasn't state after prune, fix it #47

Conversation

welkin22 commented Jan 16, 2024

Description

Rationale

Example

Changes

welkin22 commented Jan 17, 2024