CNDB-13693: Make SAI's view referenceable to simplify locking query view #1700

michaeljmarshall · 2025-04-17T22:21:04Z

What is the issue

Fixes: https://github.com/riptano/cndb/issues/13693

What does this PR fix and why was it fixed

The new design is to:

Make sai View reference-able so that it holds references to the underlying SSTableIndexs. When the view is released the final time, it releases the indexes. This moves all of the complexity of grabbing references to sstable update time and out of the query path, which seems like a generally good improvement.
Observe that SSTableIndex holds a reference to its associated sstable reader.

I am somewhat concerned about what will happen if indexes are missing, so I have some unanswered TODO comments in the PR.

Note also that we need to create the memtable index on deletions in order to make the index-first solution work.

github-actions · 2025-04-17T22:21:24Z

src/java/org/apache/cassandra/index/sai/IndexContext.java

src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java

src/java/org/apache/cassandra/index/sai/plan/QueryView.java

jasonstack · 2025-04-21T05:34:14Z

I will review it tmr.

Update: I won't be able to review it on Tuesday. Will do it within this week.

michaeljmarshall · 2025-04-23T23:07:45Z

src/java/org/apache/cassandra/index/sai/disk/vector/CassandraOnHeapGraph.java

+                    if (matcher.apply(cv))
+                    {
+                        // We can exit now because we won't find a better candidate
+                        var candidate = new PqInfo(searcher.getPQ(), searcher.containsUnitVectors(), segment.metadata.numRows);


TODO: make sure the PQ is safe to use after releasing the view.

how about adding a SSTableIndex in PqInfo and release it when PqInfo is done

jasonstack · 2025-04-24T05:50:32Z

test/unit/org/apache/cassandra/index/SecondaryIndexManagerTest.java

@@ -194,10 +194,12 @@ public void testIndexRebuildWhenAddingSStableViaRemoteReload()
        assertEmpty(execute("SELECT * FROM %s WHERE a=1"));
        assertEmpty(execute("SELECT * FROM %s WHERE c=1"));



This test testIndexRebuildWhenAddingSStableViaRemoteReload was added to test legacy writer behavior that built index on writer. Now writer only builds index on flush, not reload.

During sstable reload, reloaded sstable should have index attached (compaction task would abort if failed to build index; if initial index build is not completed, index is not-queryable, it's fine); unless it's added as shallow sstable.

this test can be removed if you prefer

src/java/org/apache/cassandra/index/sai/view/View.java

src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java

jasonstack · 2025-04-24T07:16:51Z

src/java/org/apache/cassandra/index/sai/plan/QueryView.java

+
+                var sstableReaders = new ArrayList<SSTableReader>(saiView.size());
+                // These are already referenced because they are referenced by the same view we just referenced.
+                // TODO review saiView.match() method for boolean predicates.


there is an issue about boolean predicate support?

I'll create it. The point is that we have a data structure in the view to efficiently get the indexes that might satisfy the predicate. Without using match, we're just searching all indexes.

jasonstack · 2025-04-24T07:29:40Z

src/java/org/apache/cassandra/index/sai/plan/QueryView.java

-                // of nanoceconds, but the timeout is large enough just in case of unpredictable performance hiccups.
-                outer:
-                while (!MonotonicClock.approxTime.isAfter(start + TimeUnit.MILLISECONDS.toNanos(2000)))
+                // Get memtables first in case we are in the middle of flushing one.


Now, the only case SAI might return partial data would be with shallow sstable in step3.

1. during reload after compaction, it timed out downloading archive for newly created sstable and added shallow sstable into tracker 2. `RemoteStorageHandler` unloads compacted sstables from tracker. 3. SAI query is being processed 4. when sstable reload ended, it marked the index as non-queryable

We either add corresponding dummy SSTableContext/SSTableIndex for shallow sstable and fail the query (depending on CNDB SSTableReloadFailureMode) if the query uses the dummy index.

Or we double-check in CFS#tracker if there is shallow sstable and decide whether to abort/proceed query based on CNDB SSTableReloadFailureMode``

Why are the unload and reload ending steps distinct? Is there any way to update that logic so that we don't have this race? At the moment, the CFS#tracker doesn't really expose the shallow sstable issue because the class is out of the CC scope.

From a design perspective, I think it would make the most sense to solve this in CNDB since it is a purely cndb construct.

Why are the unload and reload ending steps distinct?

RemoteStorageHandler processes sstables in batch of 100. Let me revise my example.

Assuming that adding shallow sstable and obsoleting compacted sstable are applied in the same transaction.

Before reloading, CFS has 1 sstable. Now compactor rewrites the sstable into another one.

During reload after compaction, it timed out downloading archive for newly created sstable and uses shallow sstable. At the same time, it unloads compacted sstables from tracker.

CFS#tracker now contains only 1 shallow sstable, but SSTableIndex still references the old sstable.

With current patch, SAI query is being processed and completed properly.

When SAIGroup receives sstable changed notification, it won't find index files for the shallow sstable and release index for compacted sstable. After SSTableContextManager#update and IndexViewManager#updae, View would be empty.

SAI query is being processed and responds with partial data.

CNDB side marks the index as non-queryable at the end of reloading

SAI query is now blocked as expected.

Would be nice to have integration test for above case. I think the culprit is in step 4. We should mark index as non-queryable there if it's trying to add a non-preexisting sstable (e.g. shallow) without proper index to prevent responding partial data.

Thanks, that explanation makes sense. If it's possible in step 1 to also mark the index as non-queryable, that seems like the most straight forward solution. Any chance you are available to help add that integration test? I think this patch will really help clusters with active SAI queries happening during compaction.

I have rebased the previous reproduction with this patch. It has reproduced the behavior of returning partial data with shallow sstable. https://github.com/riptano/cndb/compare/shallow_sstable_with_sai?expand=1

src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java

pkolaczk

This is a very nice simplification and this avoids all the looping with matching the indexes to sstables. However I have a concern if we don't miss any data in a very narrow race window when flush happens.

src/java/org/apache/cassandra/index/sai/IndexContext.java

src/java/org/apache/cassandra/index/sai/view/View.java

src/java/org/apache/cassandra/index/sai/plan/QueryView.java

…onstructor

sonarqubecloud · 2025-04-24T21:56:06Z

Quality Gate passed

Issues
8 New issues
0 Accepted issues

Measures
0 Security Hotspots
85.3% Coverage on New Code
6.1% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-04-24T22:02:42Z

❌ Build ds-cassandra-pr-gate/PR-1700 rejected by Butler

3 new test failure(s) in 6 builds
See build details here

Found 3 new test failures

Test	Explanation	Branch history	Upstream history
...gLegacyIndex.test_sstableloader_with_failing_2i	regression	🔴🔴🔴🔴🔴	🔵🔵🔵🔵🔵🔵🔵
....NativeIndexDDLTest.verifyIndexWithDecommission	regression	🔴🔵🔵🔵🔵🔵	🔵🔵🔵🔵🔵🔵🔵
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS...	regression	🔴🔵🔴🔵🔵🔴	🔵🔵🔵🔵🔵🔵🔵

Found 18 known test failures

jasonstack

Left one comment on potential race with shallow sstable, the rest LGTM

CNDB-13693: Make SAI's view referenceable to simplify locking query view

4aa5552

Initialize SAI memtable index on (range) partition delete

67c556f

eolivelli reviewed Apr 18, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/IndexContext.java Show resolved Hide resolved

eolivelli reviewed Apr 18, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java Outdated Show resolved Hide resolved

eolivelli reviewed Apr 18, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/plan/QueryView.java Outdated Show resolved Hide resolved

Address review feedback

004949c

michaeljmarshall requested review from pkolaczk and jasonstack April 18, 2025 14:32

Fix compilation

3eb8761

michaeljmarshall added 2 commits April 23, 2025 16:46

Cleanup some old logic; fix possibly leaked view

059e5c6

Fix attemt to read from unreferenced view index

05ad2a7

michaeljmarshall commented Apr 23, 2025

View reviewed changes

Merge remote-tracking branch 'datastax/main' into cndb-13693

c2dd650

jasonstack reviewed Apr 24, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/view/View.java Outdated Show resolved Hide resolved

jasonstack reviewed Apr 24, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java Outdated Show resolved Hide resolved

jasonstack reviewed Apr 24, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java Outdated Show resolved Hide resolved

pkolaczk reviewed Apr 24, 2025

View reviewed changes

src/java/org/apache/cassandra/index/sai/IndexContext.java Outdated Show resolved Hide resolved

src/java/org/apache/cassandra/index/sai/view/View.java Outdated Show resolved Hide resolved

src/java/org/apache/cassandra/index/sai/plan/QueryView.java Outdated Show resolved Hide resolved

michaeljmarshall added 4 commits April 24, 2025 13:05

Remove sstables from View; minor cleanup from pr review

9af8e2f

Make getReferencedView terminate after a deadline

a02fc92

Simplify IndexViewManager::update by moving referencing out of View c…

6cc7105

…onstructor

Merge remote-tracking branch 'datastax/main' into cndb-13693

577ab01

jasonstack reviewed Apr 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNDB-13693: Make SAI's view referenceable to simplify locking query view #1700

CNDB-13693: Make SAI's view referenceable to simplify locking query view #1700

michaeljmarshall commented Apr 17, 2025 •

edited

Loading

github-actions bot commented Apr 17, 2025

jasonstack commented Apr 21, 2025 •

edited

Loading

michaeljmarshall Apr 23, 2025

jasonstack Apr 24, 2025

jasonstack Apr 24, 2025

jasonstack Apr 24, 2025

jasonstack Apr 24, 2025

michaeljmarshall Apr 24, 2025

jasonstack Apr 24, 2025 •

edited

Loading

michaeljmarshall Apr 24, 2025

michaeljmarshall Apr 24, 2025

jasonstack Apr 25, 2025 •

edited

Loading

michaeljmarshall Apr 25, 2025

jasonstack Apr 25, 2025 •

edited

Loading

pkolaczk left a comment

sonarqubecloud bot commented Apr 24, 2025

cassci-bot commented Apr 24, 2025

jasonstack left a comment

		@@ -194,10 +194,12 @@ public void testIndexRebuildWhenAddingSStableViaRemoteReload()
		assertEmpty(execute("SELECT * FROM %s WHERE a=1"));
		assertEmpty(execute("SELECT * FROM %s WHERE c=1"));

CNDB-13693: Make SAI's view referenceable to simplify locking query view #1700

Are you sure you want to change the base?

CNDB-13693: Make SAI's view referenceable to simplify locking query view #1700

Conversation

michaeljmarshall commented Apr 17, 2025 • edited Loading

What is the issue

What does this PR fix and why was it fixed

github-actions bot commented Apr 17, 2025

Checklist before you submit for review

jasonstack commented Apr 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonstack Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonstack Apr 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasonstack Apr 25, 2025 • edited Loading

Choose a reason for hiding this comment

pkolaczk left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Apr 24, 2025

Quality Gate passed

cassci-bot commented Apr 24, 2025

❌ Build ds-cassandra-pr-gate/PR-1700 rejected by Butler

Found 3 new test failures

Found 18 known test failures

jasonstack left a comment

Choose a reason for hiding this comment

michaeljmarshall commented Apr 17, 2025 •

edited

Loading

jasonstack commented Apr 21, 2025 •

edited

Loading

jasonstack Apr 24, 2025 •

edited

Loading

jasonstack Apr 25, 2025 •

edited

Loading

jasonstack Apr 25, 2025 •

edited

Loading