Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-7554 AvroConverter fails to parse events #122

Closed
wants to merge 1 commit into from

Conversation

schampilomatis
Copy link
Contributor

…columns

Adjust schema names for cells to be unique. Avro uses these names to identify sub schema declarations and will not redefine a sub schema that was already detected. In this case, if we have cells with multiple types, debezium will fail to serialize the event, since the stored avro schema does not match the cells.

Copy link

Welcome as a new contributor to Debezium, @schampilomatis. Reviewers, please add missing author name(s) and alias name(s) to the COPYRIGHT.txt and Aliases.txt respectively.

@schampilomatis
Copy link
Contributor Author

schampilomatis commented Feb 22, 2024

Sharing an example schema produced for
CREATE TABLE IF NOT EXISTS test_ks.table_cdc (k text PRIMARY KEY, t text, i int) WITH cdc=true;

"fields": [
          {
            "name": "k",
            "type": [
              "null",
              {
                "type": "record",
                "name": "cell_value",
                "fields": [
                  {
                    "name": "value",
                    "type": [
                      "null",
                      "string"
                    ],
                    "default": null
                  },
                  {
                    "name": "deletion_ts",
                    "type": [
                      "null",
                      "long"
                    ],
                    "default": null
                  },
                  {
                    "name": "set",
                    "type": "boolean"
                  }
                ],
                "connect.version": 1,
                "connect.name": "cell_value"
              }
            ],
            "default": null
          },
          {
            "name": "i",
            "type": [
              "null",
              "cell_value"
            ],
            "default": null
          },
          {
            "name": "t",
            "type": [
              "null",
              "cell_value"
            ],
            "default": null
          },
          {
            "name": "_range_start",
            "type": [
              "null",
              {
                "type": "record",
                "name": "_range_start",
                "fields": [
                  {
                    "name": "method",
                    "type": "string"
                  },
                  {
                    "name": "values",
                    "type": {
                      "type": "array",
                      "items": {
                        "type": "record",
                        "name": "clustering_value",
                        "fields": [
                          {
                            "name": "name",
                            "type": "string"
                          },
                          {
                            "name": "value",
                            "type": "string"
                          },
                          {
                            "name": "type",
                            "type": "string"
                          }
                        ],
                        "connect.version": 1,
                        "connect.name": "clustering_value"
                      },
                      "connect.version": 1,
                      "connect.name": "clustering_values"
                    }
                  }
                ],
                "connect.version": 1,
                "connect.name": "_range_start"
              }
            ],
            "default": null
          },
          {
            "name": "_range_end",
            "type": [
              "null",
              {
                "type": "record",
                "name": "_range_end",
                "fields": [
                  {
                    "name": "method",
                    "type": "string"
                  },
                  {
                    "name": "values",
                    "type": {
                      "type": "array",
                      "items": "clustering_value",
                      "connect.version": 1,
                      "connect.name": "clustering_values"
                    }
                  }
                ],
                "connect.version": 1,
                "connect.name": "_range_end"
              }
            ],
            "default": null
          }
        ],

We see here that all of value for k, t, i are of type [null, "cell_value"], with "cell_value" being a named schema of "string". By setting the name of the schemas to something unique (column name) we avoid this.

This is a partial revert of commit:

ec9e889a

Adjust schema names for cells to be unique. Avro uses these names to
identify sub schema declarations and will not redefine a sub schema that
was already detected. In this case, if we have cells with multiple
types, debezium will fail to serialize the event, since the stored avro
schema does not match the cells.
Copy link

Hi @schampilomatis, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

@schampilomatis schampilomatis changed the title DBZ-7554 AvroConverter fails to parse events on tables with multiple … DBZ-7554 AvroConverter fails to parse events Feb 22, 2024
@schampilomatis
Copy link
Contributor Author

Duplicate of #121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant