Skip to content

MDEV-36290: ALTER TABLE with multi-master can cause data loss #3967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: 10.6
Choose a base branch
from

Conversation

bnestere
Copy link
Contributor

@bnestere bnestere commented Apr 11, 2025

One can have data loss in multi-master setups when 1) both masters update the same table, 2) ATLER TABLE is run on one master which re-arranges the column ordering, and 3) transactions are binlogged in ROW binlog_format. This is because a slave identifies columns to update using column index number. That is, if a transaction updates a table with columns on the master, the binary log ROW event will store its data column-by-column, from the first column, to the th column, in-order. When the slave applies this row to its table, it simply updates each column of its new row using these same values, in the same order. If the slave’s table has its columns in a different order (from some ALTER TABLE, which can have added, removed, or re-arranged) these columns, the data will still be stored in the order that it was done on the master table. This leads to data loss.

This patch adds the ability for a slave to lookup column name when applying ROW events, so if the ordering of columns on the master and slave differs, the slave can still apply the data changes to the correct column. This is limited to when a master binlogs events using option binlog_row_metadata=FULL, as this extends Table_map_log_event metadata to contain column names. When this column name metadata is missing, the lookup behavior is unchanged (i.e. it will use column index, as before).

Patch is currently still being drafted. Extensive MTR testing is still TODO, as well as addressing various TODOs in the code. The current state of MTR tests for this patch is simply reflective of the JIRA description, so it is very minimal, to just show the issue generally.

@bnestere bnestere added MariaDB Corporation Replication Patches involved in replication labels Apr 11, 2025
@bnestere bnestere requested a review from knielsen April 11, 2025 21:34
@bnestere bnestere self-assigned this Apr 11, 2025
@knielsen
Copy link
Member

knielsen commented Apr 16, 2025 via email

Test to show that the slave doesn't re-binlog correctly if it has extra
columns. Note this is only broken in 10.6, 10.11 fixes it (as well as my
MDEV-36290 patch). The problem is that the slave doesn't set
table->rpl_write_set correctly. Also table->read_set and table->write_set
are not correct, but I haven't # yet looked into the implications of that..

To show this, this test uses chain replication with slave-specific columns.
The default value for the column is different on each slave. The middle-slave
doesn't write its column to the binlog (despite being configured with
binlog_row_image=FULL), and so when the final slave in the chain sees the
event, it only sees one column being updated, and then uses its default for
its extra column, which is different than the middle slaves default. Then,
we use diff_tables.inc to show the difference between the tables.
One can have data loss in multi-master setups when 1) both masters
update the same table, 2) ATLER TABLE is run on one master which
re-arranges the column ordering, and 3) transactions are binlogged
in ROW binlog_format. This is because a slave identifies columns to
update using column index number. That is, if a transaction updates
a table with <n> columns on the master, the binary log ROW event
will store its data column-by-column, from the first column, to the
<n>th column, in-order. When the slave applies this row to its
table, it simply updates each column of its new row using these same
values, in the same order. If the slave’s table has its columns in a
different order (from some ALTER TABLE, which can have added,
removed, or re-arranged) these columns, the data will still be
stored in the order that it was done on the master table. This leads
to data loss.

This patch adds the ability for a slave to lookup column name when
applying ROW events, so if the ordering of columns on the master and
slave differs, the slave can still apply the data changes to the
correct column. This is limited to when a master binlogs events
using option binlog_row_metadata=FULL, as this extends
Table_map_log_event metadata to contain column names. When this
column name metadata is missing, the lookup behavior is unchanged
(i.e. it will use column index, as before).

General code changes that extend beyond just this patch:

 1. In unpack_row(), column "cleanup" for fields that don't exist on
    the slave is now done as unpacking is done. Prior to this patch,
    this cleanup would happen only after unpacking the event data, as
    the only place that the event data could have extra rows would be
    after the slave fields had been processed. Now, the slave can be
    missing _any_ field that was logged on the master; and we have to
    account for that as we see it. That is, for fields in the event
    that are missing on the slave, we must move forward the both
    the pointer that tracks the next spot to unpack data, as well as
    account for our null_mask for tracking null columns. To try and
    help ensure this new logic is correct, a new assertion is added,

      assert_master_idx_matches_null_tracking()

    which verifies that our tracking of master indices aligns with
    our null field tracking (albeit this can only be done when all
    fields of a row event are binlogged, e.g when logged with
    binlog_row_image=FULL). Note that we can't meaningfully validate
    our master index is consistent with null bit tracking when we have
    missing columns, as they can be missing in arbitrary places, and if
    we were to track when a bit is written, we would have to refer to
    master_idx, which would invalidate the assertion in the first
    place.

 2. table->{read_set, write_set, rpl_write_set} also needed to be
    updated so its bits would adjust for offset columns on slave. Note
    that rpl_write_set is set in
    TABLE::mark_columns_per_binlog_row_image(), which is pulled from
    10.11. This fixes an existing bug in 10.6 where all slave columns
    wouldn't be written if the slave had additional columns
    (though the bug doesn't exits in 10.11, because it already calls
    mark_columns_per_binlog_row_image()).

 3. There was an issue with the conversion table, where its columns
    weren't in sync with the table_def generated by the
    Table_map_log_event. I.e., when the slave didn't have the column,
    the conv_table generation logic would simply skip the rest of the
    loop and go on to the next column (without adding a field for the
    missing column). The compatible_with and unpack_row logic, however,
    assumed the conv_table was in alignment with the columns from the
    Rows_log_event. I thought of two ways to fix this:
       a. Add a dummy field to the conv_table (and later reset it to
          NULL).
       b. Have a separate index variable which tracks the conv_table
          fields.
    This patch implements the second (b), as it is more efficient.

Patch is currently still being drafted. TODOs left:
 1. Finish MTR testing current changes
 2. Add error-checking logic & MTR tests
 3. Optimize master_to_slave_index_map and master_unmatched_cols
    to use an array and bitmap (per Kristian's review suggestion)
 4. Re-test rpl and multi_source MTR suites with
    binlog_row_metadata=FULL (temporarily change default value) to
    make sure nothing fails from that option alone
 5. Cleanup patch (remove printfs for debugging)
 6. Extensive QA testing

Reviewed By:
============
<TODO>
Added NULL columns in testing, parameterizd tests by files so testing
combinations of options (Add/drop/type conversion/NULL) together is
easier.

Some TODOs remain to be written.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MariaDB Corporation Replication Patches involved in replication
Development

Successfully merging this pull request may close these issues.

2 participants