Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: re-allow empty pk.fields for some pk.mode #360

Conversation

jclarysse
Copy link
Contributor

Recently added validations (see PR #335) are preventing the support of schema evolution use-cases, e.g.

Documentation:

If 'pk.mode' is 'record_key' and 'pk.fields' is empty,
then all fields from the key struct will be used.

Configuration:

"pk.mode": "record_key",
"auto.create": "true",
"auto.evolve": "true",

Validation rule:

Primary key fields must be set when pkMode is 'record_key'.

The purpose of this PR is to re-enable this feature for record_key mode.

@jclarysse jclarysse requested a review from a team as a code owner January 31, 2025 13:30
@jclarysse
Copy link
Contributor Author

jclarysse commented Jan 31, 2025

Test setup

  • Dockerized Kafka + Karapace from this compose.yml with `KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"``
  • Local Kafka Connect (official binaries)
  • Dockerized PostgreSQL (official container image)
  • Local psql client

Create Kafka topic

$KAFKA_HOME/bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic test-topic-avro --replication-factor 1 --partitions 1 --create 

Create Avro schemas in Karapace Schema Registry:

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{
       "schema": "{
          \"namespace\": \"example.avro\", \"type\": \"record\", \"name\": \"simplekey\",
          \"fields\": [{\"name\": \"id\", \"type\": \"int\"},{\"name\": \"code\", \"type\": \"int\"}]
      }"
  }' http://localhost:8081/subjects/test-topic-avro-key/versions
{"id":1}

curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  --data '{
       "schema": "{
          \"namespace\": \"example.avro\", \"type\": \"record\", \"name\": \"simplevalue\",
          \"fields\": [{\"name\": \"name\", \"type\": \"string\"},{\"name\": \"city\", \"type\": \"string\"}]
      }"
  }' http://localhost:8081/subjects/test-topic-avro-value/versions
{"id":2}

Produce Avro record through Karapace REST:

curl -H "Content-Type: application/vnd.kafka.avro.v2+json" -X POST -d \
  '{  
      "key_schema_id": 1,
      "value_schema_id": 2,
      "records": [{"key": {"id": 0, "code": 10}, "value": {"name": "julien", "city": "paris"}}]
  }' http://localhost:8082/topics/test-topic-avro
{"key_schema_id":1,"offsets":[{"offset":0,"partition":0}],"value_schema_id":1}

Create connector using all message key fields as PK:

curl -X POST http://localhost:8083/connectors \
   -H "Content-Type: application/json" \
   -H "Accept: application/json" \
   -d \
  '{
    "name": "julien-jdbc-sink-keypk",
    "config": {
        "connector.class": "io.aiven.connect.jdbc.JdbcSinkConnector",
        "connection.url": "jdbc:postgresql://localhost:5432/postgres",
        "connection.user": "postgres",
        "connection.password": "",
        "topics": "test-topic-avro",
        "topics.to.tables.mapping": "test-topic-avro:test_table_keypk",
        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://localhost:8081",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://localhost:8081",
        "pk.mode": "record_key",
        "auto.create": "true",
        "auto.evolve": "true"
    }
  }'

Verify that a table is created with all message key fields as PK:

# connector logs
INFO [julien-jdbc-sink-keypk|task-0] Creating table with sql: CREATE TABLE "test_table_keypk" (
"name" TEXT NOT NULL,
"city" TEXT NOT NULL,
"code" INT NOT NULL,
"id" INT NOT NULL,
PRIMARY KEY("code","id")) (io.aiven.connect.jdbc.sink.DbStructure:105)

# pg client
postgres=# \d test_table_keypk
          Table "public.test_table_keypk"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 name   | text    |           | not null | 
 city   | text    |           | not null | 
 code   | integer |           | not null | 
 id     | integer |           | not null | 
Indexes:
    "test_table_keypk_pkey" PRIMARY KEY, btree (code, id)

Recently added validations (see PR Aiven-Open#335) are preventing the
support of schema evolution use-cases, e.g.

Documentation:
```
If 'pk.mode' is 'record_key' and 'pk.fields' is empty,
then all fields from the key struct will be used.
```

Configuration:
```
"pk.mode": "record_key",
"auto.create": "true",
"auto.evolve": "true",
```

Validation rule:
```
Primary key fields must be set when pkMode is 'record_key'.
```

The purpose of this PR is to re-enable this feature
for both `record_key` and `record_value` modes.
@jclarysse jclarysse force-pushed the jclarysse/allow-empty-pkfields-with-recordkey-pkmode branch from 91e2711 to a20df4d Compare February 17, 2025 14:34
@jclarysse
Copy link
Contributor Author

jclarysse commented Feb 17, 2025

Create another connector using all message value fields as PK:

curl -X POST http://localhost:8083/connectors \
   -H "Content-Type: application/json" \
   -H "Accept: application/json" \
   -d \
  '{
    "name": "julien-jdbc-sink-valuepk",
    "config": {
        "connector.class": "io.aiven.connect.jdbc.JdbcSinkConnector",
        "connection.url": "jdbc:postgresql://localhost:5432/postgres",
        "connection.user": "postgres",
        "connection.password": "",
        "topics": "test-topic-avro",
        "topics.to.tables.mapping": "test-topic-avro:test_table_valuepk",
        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://localhost:8081",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://localhost:8081",
        "pk.mode": "record_value",
        "auto.create": "true",
        "auto.evolve": "true"
    }
  }'

Verify that a table is created with all message key fields as PK:

# connector logs
INFO [julien-jdbc-sink-valuepk2|task-0] Creating table with sql: CREATE TABLE "test_table_valuepk" (
"name" TEXT NOT NULL,
"city" TEXT NOT NULL,
PRIMARY KEY("name","city")) (io.aiven.connect.jdbc.sink.DbStructure:105)

# pg client
postgres=# \d test_table_valuepk
       Table "public.test_table_valuepk"
 Column | Type | Collation | Nullable | Default 
--------+------+-----------+----------+---------
 name   | text |           | not null | 
 city   | text |           | not null | 
Indexes:
    "test_table_valuepk_pkey" PRIMARY KEY, btree (name, city)

@jclarysse jclarysse changed the title fix: re-allow empty pk.fields with pk.mode set to record_key fix: re-allow empty pk.fields for some pk.mode Feb 17, 2025
@jclarysse jclarysse requested a review from keejon February 17, 2025 14:36
Copy link
Contributor

@keejon keejon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@keejon keejon merged commit a2b15d8 into Aiven-Open:master Feb 17, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants