DBZ-8347 implementation of Cassandra 5 support #135

smiklosovic · 2024-10-23T12:55:05Z

No description provided.

smiklosovic · 2024-10-23T14:02:42Z

@vjuranek @jpechane this should go in when debezium/debezium#5947 is merged

smiklosovic · 2024-10-24T08:31:20Z

@jpechane Cassandra 5.0 introduces a new data type called vector. This branch does not include it but I have a commit locally which does support that. The reason for not introducing it in this PR is that java driver of version 4.18.1 which includes vector support contains a bug (1)

I prefer to merge this as is and then we can bump java driver in parent once it is fixed in the driver and we can do another PR with vector support against this repository.

(1) https://issues.apache.org/jira/browse/CASSJAVA-2

jpechane · 2024-10-24T08:37:21Z

@smiklosovic Is there any real difference between Cassandra 4 and 5 implementation? It seems to me the only difference is the test environment. If this is the case than I don't thnk we need special module for Cassandra 5. Just a specific profile(s) to test cassandra-4 module with either Cassandra 4 server or with Cassandra 5.

jpechane · 2024-10-24T08:40:11Z

Is the java drive compatible with both Cassandra 4 and 5? If yes then again, there is no need for a new module. Just as you introduce the vector support then the test would be conditionally run only agains Cassandra 5. We do this for other databases too. It is driven by custome annotations and test rules.

smiklosovic · 2024-10-24T08:52:24Z

There are changes around schema management in Cassandra5SchemaChangeListener and CassandraConnectorTask otherwise it is more or less same but it is not exact copy.

I was planning to consolidate all tests into separate module, if you think about that, we might have a test module / artifact and each implementation module would depend on it and run them (somehow, I have not figured out that part yet). For now we just copy them all which is not necessary.

When it comes to the implementation, 4 and 5 are the most identical but not same. For simplicity I would just go with this but we might think how to consolidate what is same too.

When 5.1 lands, it will be again something different. There is quite an overhaul of schema management and I expect changes again in that area.

Unfortunately Cassandra as such is not modular enough to cherry-pick just what we want, it is also the most safe when we compile the code against a respective Cassandra version.

smiklosovic · 2024-10-24T09:33:07Z

Honestly, it is quite a miracle how slim the implementation modules are. There are just bunch of bootstrap classes and the actual handlers of the mutations / events. If we get rid of the tests / consolidate most of them into the one artifact I am quite satisfied how it looks like already. The overhead is minimal anyway.

The problem is that it backfired in the past when I tried to have one implementation for more than one Cassandra version. If you depend on some Cassandra interface or classes, you need to compile against some Cassandra version but then it might happen that when it is run on a newer version it would not work. It "seems" like it is same but it is not. There was a case when in one version there was an interface and in another version it had same name but it was a class. So you compiled it against one version and all looked just fine but when run against different version it errorred out in runtime. That is the reason why core module does not depend on any Cassandra version at all. It is not only architecturally correct but it is actually a necessity.

jpechane · 2024-10-24T09:56:25Z

OK< makes sense. Let's keep it as as now and we can follow-up with codebase optimization later.

jpechane · 2024-10-24T09:59:30Z

@smiklosovic Applied, thanks a lot!

jpechane · 2024-10-24T13:48:26Z

@smiklosovic Just FYI when you start to implement vector type support, please check io.debezium.data.vector and re-use or add types as suitable. It seems to me there can be two vector types - those that support vector search and those that can't. If this is the case then those not supporting vector search should map to array and those that do should map to Debezium's logical type.

smiklosovic force-pushed the DBZ-8347 branch 2 times, most recently from 4b36e81 to 1e64d38 Compare October 23, 2024 13:11

DBZ-8347 implementation of Cassandra 5 support

3583f7b

smiklosovic force-pushed the DBZ-8347 branch from 1e64d38 to 3583f7b Compare October 23, 2024 13:47

smiklosovic mentioned this pull request Oct 23, 2024

DBZ-8348 add Cassandra 5 version to pom.xml debezium/debezium#5947

Merged

DBZ-8347 Use Debezium parent pom for version management

b29f34c

jpechane merged commit 1eb71cf into debezium:main Oct 24, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBZ-8347 implementation of Cassandra 5 support #135

DBZ-8347 implementation of Cassandra 5 support #135

smiklosovic commented Oct 23, 2024

smiklosovic commented Oct 23, 2024

smiklosovic commented Oct 24, 2024 •

edited

Loading

jpechane commented Oct 24, 2024

jpechane commented Oct 24, 2024

smiklosovic commented Oct 24, 2024 •

edited

Loading

smiklosovic commented Oct 24, 2024

jpechane commented Oct 24, 2024

jpechane commented Oct 24, 2024

jpechane commented Oct 24, 2024

DBZ-8347 implementation of Cassandra 5 support #135

DBZ-8347 implementation of Cassandra 5 support #135

Conversation

smiklosovic commented Oct 23, 2024

smiklosovic commented Oct 23, 2024

smiklosovic commented Oct 24, 2024 • edited Loading

jpechane commented Oct 24, 2024

jpechane commented Oct 24, 2024

smiklosovic commented Oct 24, 2024 • edited Loading

smiklosovic commented Oct 24, 2024

jpechane commented Oct 24, 2024

jpechane commented Oct 24, 2024

jpechane commented Oct 24, 2024

smiklosovic commented Oct 24, 2024 •

edited

Loading

smiklosovic commented Oct 24, 2024 •

edited

Loading