Release 8.0.35-30 #1525

adivinho · 2023-12-04T12:33:28Z

No description provided.

…numbers mismatch" error https://jira.percona.com/browse/PXB-3168 TL;DR - During the implementation of new redo log design parser at 8.0.30, PXB missed a condition that the logs can be reused and blocks read in buffer can be an old block. Redo Log Design: The new design of redo logs at 8.0.30, uses an ever incremental post-fix ID in the redo log file number. The server can have up to 32 active redo log files. Once a file is recycled, it gets renamed to the new corresponding ID (its current number + 32 _tmp). When made active again by the server, it gets renamed to remove the _tmp from its name. This means the logs can have data from before the recycle. PXB redo log copying works in three parts: 1. Read a chunk of 64K (4* page size) into read buffer at `read_log_seg` 2. Based on the last known LSN, parse the new data blocks and check if the block nr in the block is exactly 1 block ahead of the last block number. Keep doing this until it finds a block that mismatches. This is done at `scan_log_recs`. 3. Write the blocks up to the position found at step 2 into `xtrabackup_logfile` There are two conditions to stop parsing the buffer and identify whether we are reading the tail of the log recycled log. This happens when we are catch up with the server (reading the most up to date block written by the server): ** Condition 1 - The next block is lower than what we expected: For simplicity, we will demonstrate 2 logs instead of 32. Once log 2 is full, We will re-use log1 (renamed to log3) and write up to slot 4. H - head of the log - last position written by the server T - tail of the log - garbage from the log when it was named log1 ``` slots | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | log 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | log 2 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | log 3 | 19 | 20 | 21 | 22 | 5 | 6 | 7 | 8 | 9 | | | | | H | T | | | | | ``` Here we will read the block and identify that block nr 5 is lower than expected and will consider the parsing as finished at block 22. ** Condition 2 - The next block is higher than what we expected: Blocks wrap around at number 1073741824 (1G), meaning they will restart from 1 when we reach a LSN that is 512G (1G block, each block is 512 bytes - OS_FILE_LOG_BLOCK_SIZE). For simplicity, let's get the same 9 slots log as above and 2 logs in total and wrap around after block 7 ``` slots | S1 | S2 | S3 | S4 | S5 | S6 | S7 | S8 | S9 | log 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 1 | 2 | log 2 | 3 | 4 | 5 | 6 | 7 | 1 | 2 | 3 | 4 | log 3 | 5 | 6 | 7 | 1 | 5 | 6 | 7 | 1 | 2 | | | | | H | T | | | | | ``` Wrap around is identified with the below formula: * continuos_block_capacity_in_redo - the number of 512-bytes block capacity we have before we start to overwrite data. * wrap_around_block_capacity - When our blocks wrap around. expected_hdr_nr - last successful read block + 1. read_block_number - block number read from block header. Formula: ``` ((expected_hdr_nr | (continuos_block_capacity_in_redo - 1)) - read_block_number) % wrap_around_block_capacity == 0 ``` On the above example when we read the 5 coming from the tail of previous log data, we have: * continuos_block_capacity_in_redo - We have 9 slots per redo and 2 redo, resulting in a continuous of 18 blocks, minus 1 resulting in 17 blocks. * wrap_around_block_capacity - In the above example we wrap at block 7. In the server is at 1G * expected_hdr_nr - 2 * read_block_number - 5 Resulting in: ``` (gdb) p ((2 | (18-1)) - 5) % 7 == 0 $1 = true ``` This means the block number 5 comes from the tail of previous data in the recycled log. The Problem: Xtrabackup was missing the second condition, and only considered old block from the log buffer in case the `read_block_number` was lower than the `expected_hdr_nr`. Fix: We are using the upstream approach of considering the block as garbage if it differs from what we expect, with the addition of also validating if the checksum is correct.

Fixed PXB-3168 - Under high write load, backup fails with "log block …

altmannmarcelo

LGTM

altmannmarcelo added 2 commits November 23, 2023 13:23

Merge pull request #1523 from altmannmarcelo/PXB-3168-8.0

6beb4b4

Fixed PXB-3168 - Under high write load, backup fails with "log block …

adivinho requested review from altmannmarcelo and satya-bodapati December 4, 2023 12:33

altmannmarcelo approved these changes Dec 4, 2023

View reviewed changes

adivinho merged commit ef0916b into 8.0 Dec 4, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 8.0.35-30 #1525

Release 8.0.35-30 #1525

adivinho commented Dec 4, 2023

altmannmarcelo left a comment

Release 8.0.35-30 #1525

Release 8.0.35-30 #1525

Conversation

adivinho commented Dec 4, 2023

altmannmarcelo left a comment

Choose a reason for hiding this comment