Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: if already known beacon payload hasn't state after prune, fix it #47

Closed
wants to merge 1 commit into from

Conversation

welkin22
Copy link
Contributor

Description

After executing the pruning command geth snapshot prune-state --datadir {the data dir of your bsc node} --triesInMemory=32 on the current op-geth, the node may encounter a situation where the block height is stuck and unable to increase when restarted.
prune1 drawio
When pruning, geth will by default select the block height corresponding to the bottom layer of diffLayer in the snapshot structure as the target block height. Since we have configured triesInMemory=32 , there are a total of 32 layers in diffLayer, and the target block height is the latest block height minus 31 blocks. After pruning, all block heights except for the target block height will have their state data cleared.
When geth is restarted, the code will automatically start rolling back the chain until a block height with state data is found, since the latest block height has lost state data. Therefore, the unsafe block height will roll back 31 block heights. It is worth noting that although the unsafe block height is rolled back, the header, body, receipts and other data are not deleted and still exist in the database.
At this time, op-node will also start, and it will use op-geth to obtain the new unsafe block height, and based on this, it will produce a block to advance the block height header.
At this time, if the node is a sequencer, we will encounter two situations:

  1. The hash values of the new unsafe blocks obtained by the op-node from the 1002nd block to the 1032nd block are different from those in the op-geth database. Therefore, the newPayload interface will reconstruct the blocks without skipping any processing, and the state data corresponding to the blocks in this interval will be rebuilt. The 1033rd block can be inserted normally, and the block height can increase normally. The biggest problem with this situation is that the transactions of the 31 blocks that the user has previously put on the chain are discarded, the hash value of the block height changes, and the transactions included in the block have also changed.
  2. The new unsafe block height hash value obtained by the op-node in block 1002 is the same as the one in the op-geth database, so the newPayload interface skips processing this block height. Since block 1002 belongs to the Canonical chain, it will not trigger the SetCanonical method in the subsequent forkchoiceUpdated interface, and the state data will not be rebuilt. However, the hash value obtained in block 1003 is different, so the newPayload interface will not skip processing this block height. Since block 1002 does not have state data, the subsequent processing flow in block 1003 will not pass the check, causing the entire chain to get stuck at block 1003. This situation can also occur for non-sequence nodes.

Rationale

To solve the above problem, I modify the logic of newPayload code. When a duplicate block height with the same hash value is detected, check if there is corresponding state data for this block height. If the state data is missing, trigger a rebuild. This way, the state data will not be missing for block height 1002 and block height 1003 can be inserted normally. This solution cannot solve situation 1, but situation 1 only occurs with sequencers, so we can avoid this problem by avoiding pruning on the sequencer that produces the block.

I have another solution PR: #46
Both of these solutions can solve our problem, we can discuss choosing one or both.

Example

none

Changes

Notable changes:

  • logic of newPayload changed

@welkin22 welkin22 changed the title fix: if already known beacon payload hasn't state,fix it fix: if already known beacon payload hasn't state after prune, fix it Jan 16, 2024
@welkin22
Copy link
Contributor Author

We chose this solution: #46 So close this PR.

@welkin22 welkin22 closed this Jan 17, 2024
@sysvm sysvm deleted the feature/prune_issue_fix branch July 29, 2024 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant