-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
MDEV-36290: ALTER TABLE with multi-master can cause data loss #3967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
bnestere
wants to merge
4
commits into
10.6
Choose a base branch
from
10.6-MDEV-36290
base: 10.6
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Brandon Nesterenko ***@***.***> writes:
MDEV-36290: ALTER TABLE with multi-master can cause data loss
#3967
Hi Brandon,
Here is my initial review of the MDEV-36290 draft patch.
Overall, it is clear that this is not something that's appropriate for 10.6.
It implements a new way to apply row events, based on column names (if
available) rather than column number. It is something for next dev version,
12.x.
commit de27bfb
Author: Brandon Nesterenko ***@***.***>
Date: Thu Apr 3 12:37:09 2025 -0600
MDEV-36290: ALTER TABLE with multi-master can cause data loss
Please do not use the term "data loss" in the description (I know it is not
you who introduced the term ;-).
The issue here is conflicting queries run in parallel against multiple
masters in a multi-master setup. In general, the support of multi-master
in asynchronous replication requires the user to ensure that no conflicting
queries are run, or to ensure that such conflicts will not cause undesirable
behaviour. The behaviour addressed here is just one of many ways that
conflicting ALTER TABLE could cause different data on slave and master.
Using the term "data loss" can cause misunderstanding about what the problem
is and what the change does.
Of course, every case that is handled more correctly is in principle better.
In this case, we introduce the facility to configure
(--binlog_row_metadata=FULL) replication so that it is possible to replicate
between tables on master and slave with different order of the columns. Not
sure how useful such an option is in practice, seems to be very few users
that will really need this, and such need has to be weighed against the
increase in complexity in the code and potential for bugs.
With that said, the approach in the patch generally looks reasonable. A few
initial detail comments below, as you mentioned this is still a draft patch.
As you also mentioned to me earlier, the logic in the patch is not trivial.
I did not yet try to understand in detail all parts of the logic and the
different corner cases.
+ unsigned char *get_optional_metadata_str()
+ {
+ return m_optional_metadata;
+ }
+
+ unsigned int get_optional_metadata_len()
+ {
+ return m_optional_metadata_len;
+ }
+
Generally I do not like such do-nothing accessor functions. They just add
abstraction and reduce grepability without doing anything useful. That's a
style preference of course.
+ /*
+ Maps column index from master to slave. This is determined using the field
+ names (provied by optional metadata when the master is configured with
+ binlog_row_metadata=FULL).
+ */
+ std::map<uint, uint> master_to_slave_index_map;
+
+ /*
+ When using field names to map from master->slave columns, this keeps track
+ of column indices on the master which aren't present on the slave.
+
+ It is used to skip columns when checking type-conversions and unpacking
+ row data.
+ */
+ std::set<uint> master_unmatched_cols;
Is using std::map/std::set going to be expensive here, this is operations
that will happen on every row-based event (when --binlog-row-metadata=FULL)?
The std:: functions can imply a lot of memory allocations that are not very
visible, and we usually try to avoid them in performance-relevant server
code.
master_to_slave_index_map is just a lookup table of numbers 0..#columns-1,
right? So it doesn't need to be an associative array, a normal array is
sufficient, and doesn't need to be re-allocated each event.
And master_unmatched_cols is again just a set of numbers 0..#columns-1,
so a bitmap could be used.
- Kristian.
|
Test to show that the slave doesn't re-binlog correctly if it has extra columns. Note this is only broken in 10.6, 10.11 fixes it (as well as my MDEV-36290 patch). The problem is that the slave doesn't set table->rpl_write_set correctly. Also table->read_set and table->write_set are not correct, but I haven't # yet looked into the implications of that.. To show this, this test uses chain replication with slave-specific columns. The default value for the column is different on each slave. The middle-slave doesn't write its column to the binlog (despite being configured with binlog_row_image=FULL), and so when the final slave in the chain sees the event, it only sees one column being updated, and then uses its default for its extra column, which is different than the middle slaves default. Then, we use diff_tables.inc to show the difference between the tables.
One can have data loss in multi-master setups when 1) both masters update the same table, 2) ATLER TABLE is run on one master which re-arranges the column ordering, and 3) transactions are binlogged in ROW binlog_format. This is because a slave identifies columns to update using column index number. That is, if a transaction updates a table with <n> columns on the master, the binary log ROW event will store its data column-by-column, from the first column, to the <n>th column, in-order. When the slave applies this row to its table, it simply updates each column of its new row using these same values, in the same order. If the slave’s table has its columns in a different order (from some ALTER TABLE, which can have added, removed, or re-arranged) these columns, the data will still be stored in the order that it was done on the master table. This leads to data loss. This patch adds the ability for a slave to lookup column name when applying ROW events, so if the ordering of columns on the master and slave differs, the slave can still apply the data changes to the correct column. This is limited to when a master binlogs events using option binlog_row_metadata=FULL, as this extends Table_map_log_event metadata to contain column names. When this column name metadata is missing, the lookup behavior is unchanged (i.e. it will use column index, as before). General code changes that extend beyond just this patch: 1. In unpack_row(), column "cleanup" for fields that don't exist on the slave is now done as unpacking is done. Prior to this patch, this cleanup would happen only after unpacking the event data, as the only place that the event data could have extra rows would be after the slave fields had been processed. Now, the slave can be missing _any_ field that was logged on the master; and we have to account for that as we see it. That is, for fields in the event that are missing on the slave, we must move forward the both the pointer that tracks the next spot to unpack data, as well as account for our null_mask for tracking null columns. To try and help ensure this new logic is correct, a new assertion is added, assert_master_idx_matches_null_tracking() which verifies that our tracking of master indices aligns with our null field tracking (albeit this can only be done when all fields of a row event are binlogged, e.g when logged with binlog_row_image=FULL). Note that we can't meaningfully validate our master index is consistent with null bit tracking when we have missing columns, as they can be missing in arbitrary places, and if we were to track when a bit is written, we would have to refer to master_idx, which would invalidate the assertion in the first place. 2. table->{read_set, write_set, rpl_write_set} also needed to be updated so its bits would adjust for offset columns on slave. Note that rpl_write_set is set in TABLE::mark_columns_per_binlog_row_image(), which is pulled from 10.11. This fixes an existing bug in 10.6 where all slave columns wouldn't be written if the slave had additional columns (though the bug doesn't exits in 10.11, because it already calls mark_columns_per_binlog_row_image()). 3. There was an issue with the conversion table, where its columns weren't in sync with the table_def generated by the Table_map_log_event. I.e., when the slave didn't have the column, the conv_table generation logic would simply skip the rest of the loop and go on to the next column (without adding a field for the missing column). The compatible_with and unpack_row logic, however, assumed the conv_table was in alignment with the columns from the Rows_log_event. I thought of two ways to fix this: a. Add a dummy field to the conv_table (and later reset it to NULL). b. Have a separate index variable which tracks the conv_table fields. This patch implements the second (b), as it is more efficient. Patch is currently still being drafted. TODOs left: 1. Finish MTR testing current changes 2. Add error-checking logic & MTR tests 3. Optimize master_to_slave_index_map and master_unmatched_cols to use an array and bitmap (per Kristian's review suggestion) 4. Re-test rpl and multi_source MTR suites with binlog_row_metadata=FULL (temporarily change default value) to make sure nothing fails from that option alone 5. Cleanup patch (remove printfs for debugging) 6. Extensive QA testing Reviewed By: ============ <TODO>
66841d9
to
79aab08
Compare
Added NULL columns in testing, parameterizd tests by files so testing combinations of options (Add/drop/type conversion/NULL) together is easier. Some TODOs remain to be written.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
One can have data loss in multi-master setups when 1) both masters update the same table, 2) ATLER TABLE is run on one master which re-arranges the column ordering, and 3) transactions are binlogged in ROW binlog_format. This is because a slave identifies columns to update using column index number. That is, if a transaction updates a table with columns on the master, the binary log ROW event will store its data column-by-column, from the first column, to the th column, in-order. When the slave applies this row to its table, it simply updates each column of its new row using these same values, in the same order. If the slave’s table has its columns in a different order (from some ALTER TABLE, which can have added, removed, or re-arranged) these columns, the data will still be stored in the order that it was done on the master table. This leads to data loss.
This patch adds the ability for a slave to lookup column name when applying ROW events, so if the ordering of columns on the master and slave differs, the slave can still apply the data changes to the correct column. This is limited to when a master binlogs events using option binlog_row_metadata=FULL, as this extends Table_map_log_event metadata to contain column names. When this column name metadata is missing, the lookup behavior is unchanged (i.e. it will use column index, as before).
Patch is currently still being drafted. Extensive MTR testing is still TODO, as well as addressing various TODOs in the code. The current state of MTR tests for this patch is simply reflective of the JIRA description, so it is very minimal, to just show the issue generally.