Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-8347 implementation of Cassandra 5 support #135

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

smiklosovic
Copy link
Member

No description provided.

@smiklosovic smiklosovic force-pushed the DBZ-8347 branch 2 times, most recently from 4b36e81 to 1e64d38 Compare October 23, 2024 13:11
@smiklosovic
Copy link
Member Author

@vjuranek @jpechane this should go in when debezium/debezium#5947 is merged

@smiklosovic
Copy link
Member Author

smiklosovic commented Oct 24, 2024

@jpechane Cassandra 5.0 introduces a new data type called vector. This branch does not include it but I have a commit locally which does support that. The reason for not introducing it in this PR is that java driver of version 4.18.1 which includes vector support contains a bug (1)

I prefer to merge this as is and then we can bump java driver in parent once it is fixed in the driver and we can do another PR with vector support against this repository.

(1) https://issues.apache.org/jira/browse/CASSJAVA-2

@jpechane
Copy link
Contributor

@smiklosovic Is there any real difference between Cassandra 4 and 5 implementation? It seems to me the only difference is the test environment. If this is the case than I don't thnk we need special module for Cassandra 5. Just a specific profile(s) to test cassandra-4 module with either Cassandra 4 server or with Cassandra 5.

@jpechane
Copy link
Contributor

Is the java drive compatible with both Cassandra 4 and 5? If yes then again, there is no need for a new module. Just as you introduce the vector support then the test would be conditionally run only agains Cassandra 5. We do this for other databases too. It is driven by custome annotations and test rules.

@smiklosovic
Copy link
Member Author

smiklosovic commented Oct 24, 2024

There are changes around schema management in Cassandra5SchemaChangeListener and CassandraConnectorTask otherwise it is more or less same but it is not exact copy.

I was planning to consolidate all tests into separate module, if you think about that, we might have a test module / artifact and each implementation module would depend on it and run them (somehow, I have not figured out that part yet). For now we just copy them all which is not necessary.

When it comes to the implementation, 4 and 5 are the most identical but not same. For simplicity I would just go with this but we might think how to consolidate what is same too.

When 5.1 lands, it will be again something different. There is quite an overhaul of schema management and I expect changes again in that area.

Unfortunately Cassandra as such is not modular enough to cherry-pick just what we want, it is also the most safe when we compile the code against a respective Cassandra version.

@smiklosovic
Copy link
Member Author

Honestly, it is quite a miracle how slim the implementation modules are. There are just bunch of bootstrap classes and the actual handlers of the mutations / events. If we get rid of the tests / consolidate most of them into the one artifact I am quite satisfied how it looks like already. The overhead is minimal anyway.

The problem is that it backfired in the past when I tried to have one implementation for more than one Cassandra version. If you depend on some Cassandra interface or classes, you need to compile against some Cassandra version but then it might happen that when it is run on a newer version it would not work. It "seems" like it is same but it is not. There was a case when in one version there was an interface and in another version it had same name but it was a class. So you compiled it against one version and all looked just fine but when run against different version it errorred out in runtime. That is the reason why core module does not depend on any Cassandra version at all. It is not only architecturally correct but it is actually a necessity.

@jpechane
Copy link
Contributor

OK< makes sense. Let's keep it as as now and we can follow-up with codebase optimization later.

@jpechane jpechane merged commit 1eb71cf into debezium:main Oct 24, 2024
3 checks passed
@jpechane
Copy link
Contributor

@smiklosovic Applied, thanks a lot!

@jpechane
Copy link
Contributor

@smiklosovic Just FYI when you start to implement vector type support, please check io.debezium.data.vector and re-use or add types as suitable. It seems to me there can be two vector types - those that support vector search and those that can't. If this is the case then those not supporting vector search should map to array and those that do should map to Debezium's logical type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants