-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 8.0.35-30 #1525
Merged
Merged
Release 8.0.35-30 #1525
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…numbers mismatch" error https://jira.percona.com/browse/PXB-3168 TL;DR - During the implementation of new redo log design parser at 8.0.30, PXB missed a condition that the logs can be reused and blocks read in buffer can be an old block. Redo Log Design: The new design of redo logs at 8.0.30, uses an ever incremental post-fix ID in the redo log file number. The server can have up to 32 active redo log files. Once a file is recycled, it gets renamed to the new corresponding ID (its current number + 32 _tmp). When made active again by the server, it gets renamed to remove the _tmp from its name. This means the logs can have data from before the recycle. PXB redo log copying works in three parts: 1. Read a chunk of 64K (4* page size) into read buffer at `read_log_seg` 2. Based on the last known LSN, parse the new data blocks and check if the block nr in the block is exactly 1 block ahead of the last block number. Keep doing this until it finds a block that mismatches. This is done at `scan_log_recs`. 3. Write the blocks up to the position found at step 2 into `xtrabackup_logfile` There are two conditions to stop parsing the buffer and identify whether we are reading the tail of the log recycled log. This happens when we are catch up with the server (reading the most up to date block written by the server): ** Condition 1 - The next block is lower than what we expected: For simplicity, we will demonstrate 2 logs instead of 32. Once log 2 is full, We will re-use log1 (renamed to log3) and write up to slot 4. H - head of the log - last position written by the server T - tail of the log - garbage from the log when it was named log1 ``` slots | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | log 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | log 2 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | log 3 | 19 | 20 | 21 | 22 | 5 | 6 | 7 | 8 | 9 | | | | | H | T | | | | | ``` Here we will read the block and identify that block nr 5 is lower than expected and will consider the parsing as finished at block 22. ** Condition 2 - The next block is higher than what we expected: Blocks wrap around at number 1073741824 (1G), meaning they will restart from 1 when we reach a LSN that is 512G (1G block, each block is 512 bytes - OS_FILE_LOG_BLOCK_SIZE). For simplicity, let's get the same 9 slots log as above and 2 logs in total and wrap around after block 7 ``` slots | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | log 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 1 | 2 | log 2 | 3 | 4 | 5 | 6 | 7 | 1 | 2 | 3 | 4 | log 3 | 5 | 6 | 7 | 1 | 5 | 6 | 7 | 1 | 2 | | | | | H | T | | | | | ``` Wrap around is identified with the below formula: * continuos_block_capacity_in_redo - the number of 512-bytes block capacity we have before we start to overwrite data. * wrap_around_block_capacity - When our blocks wrap around. expected_hdr_nr - last successful read block + 1. read_block_number - block number read from block header. Formula: ``` ((expected_hdr_nr | (continuos_block_capacity_in_redo - 1)) - read_block_number) % wrap_around_block_capacity == 0 ``` On the above example when we read the 5 coming from the tail of previous log data, we have: * continuos_block_capacity_in_redo - We have 9 slots per redo and 2 redo, resulting in a continuous of 18 blocks, minus 1 resulting in 17 blocks. * wrap_around_block_capacity - In the above example we wrap at block 7. In the server is at 1G * expected_hdr_nr - 2 * read_block_number - 5 Resulting in: ``` (gdb) p ((2 | (18-1)) - 5) % 7 == 0 $1 = true ``` This means the block number 5 comes from the tail of previous data in the recycled log. The Problem: Xtrabackup was missing the second condition, and only considered old block from the log buffer in case the `read_block_number` was lower than the `expected_hdr_nr`. Fix: We are using the upstream approach of considering the block as garbage if it differs from what we expect, with the addition of also validating if the checksum is correct.
Fixed PXB-3168 - Under high write load, backup fails with "log block …
altmannmarcelo
approved these changes
Dec 4, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.