-
Notifications
You must be signed in to change notification settings - Fork 21
CNDB-13693: Make SAI's view referenceable to simplify locking query view #1700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Checklist before you submit for review
|
src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java
Outdated
Show resolved
Hide resolved
I will review it tmr. Update: I won't be able to review it on Tuesday. Will do it within this week. |
if (matcher.apply(cv)) | ||
{ | ||
// We can exit now because we won't find a better candidate | ||
var candidate = new PqInfo(searcher.getPQ(), searcher.containsUnitVectors(), segment.metadata.numRows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: make sure the PQ is safe to use after releasing the view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about adding a SSTableIndex
in PqInfo
and release it when PqInfo
is done
@@ -194,10 +194,12 @@ public void testIndexRebuildWhenAddingSStableViaRemoteReload() | |||
assertEmpty(execute("SELECT * FROM %s WHERE a=1")); | |||
assertEmpty(execute("SELECT * FROM %s WHERE c=1")); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test testIndexRebuildWhenAddingSStableViaRemoteReload
was added to test legacy writer behavior that built index on writer. Now writer only builds index on flush, not reload.
During sstable reload, reloaded sstable should have index attached (compaction task would abort if failed to build index; if initial index build is not completed, index is not-queryable, it's fine); unless it's added as shallow sstable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test can be removed if you prefer
src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java
Outdated
Show resolved
Hide resolved
|
||
var sstableReaders = new ArrayList<SSTableReader>(saiView.size()); | ||
// These are already referenced because they are referenced by the same view we just referenced. | ||
// TODO review saiView.match() method for boolean predicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an issue about boolean predicate support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create it. The point is that we have a data structure in the view to efficiently get the indexes that might satisfy the predicate. Without using match, we're just searching all indexes.
// of nanoceconds, but the timeout is large enough just in case of unpredictable performance hiccups. | ||
outer: | ||
while (!MonotonicClock.approxTime.isAfter(start + TimeUnit.MILLISECONDS.toNanos(2000))) | ||
// Get memtables first in case we are in the middle of flushing one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, the only case SAI might return partial data would be with shallow sstable in step3.
1. during reload after compaction, it timed out downloading archive for newly created sstable and added shallow sstable into tracker
2. `RemoteStorageHandler` unloads compacted sstables from tracker.
3. SAI query is being processed
4. when sstable reload ended, it marked the index as non-queryable
We either add corresponding dummy SSTableContext/SSTableIndex
for shallow sstable and fail the query (depending on CNDB SSTableReloadFailureMode
) if the query uses the dummy index.
Or we double-check in CFS#tracker
if there is shallow sstable and decide whether to abort/proceed query based on CNDB
SSTableReloadFailureMode``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the unload and reload ending steps distinct? Is there any way to update that logic so that we don't have this race? At the moment, the CFS#tracker
doesn't really expose the shallow sstable issue because the class is out of the CC scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a design perspective, I think it would make the most sense to solve this in CNDB since it is a purely cndb construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the unload and reload ending steps distinct?
RemoteStorageHandler
processes sstables in batch of 100. Let me revise my example.
Assuming that adding shallow sstable and obsoleting compacted sstable are applied in the same transaction.
- Before reloading, CFS has 1 sstable. Now compactor rewrites the sstable into another one.
- During reload after compaction, it timed out downloading archive for newly created sstable and uses shallow sstable. At the same time, it unloads compacted sstables from tracker.
- CFS#tracker now contains only 1 shallow sstable, but
SSTableIndex
still references the old sstable. - With current patch, SAI query is being processed and completed properly.
- When
SAIGroup
receives sstable changed notification, it won't find index files for the shallow sstable and release index for compacted sstable. AfterSSTableContextManager#update
andIndexViewManager#updae
,View
would be empty. - SAI query is being processed and responds with partial data.
- CNDB side marks the index as non-queryable at the end of reloading
- SAI query is now blocked as expected.
Would be nice to have integration test for above case. I think the culprit is in step 4. We should mark index as non-queryable there if it's trying to add a non-preexisting sstable (e.g. shallow) without proper index to prevent responding partial data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that explanation makes sense. If it's possible in step 1 to also mark the index as non-queryable, that seems like the most straight forward solution. Any chance you are available to help add that integration test? I think this patch will really help clusters with active SAI queries happening during compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rebased the previous reproduction with this patch. It has reproduced the behavior of returning partial data with shallow sstable. https://github.com/riptano/cndb/compare/shallow_sstable_with_sai?expand=1
src/java/org/apache/cassandra/index/sai/view/IndexViewManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice simplification and this avoids all the looping with matching the indexes to sstables. However I have a concern if we don't miss any data in a very narrow race window when flush happens.
|
❌ Build ds-cassandra-pr-gate/PR-1700 rejected by Butler3 new test failure(s) in 6 builds Found 3 new test failures
Found 18 known test failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment on potential race with shallow sstable, the rest LGTM
What is the issue
Fixes: https://github.com/riptano/cndb/issues/13693
What does this PR fix and why was it fixed
The new design is to:
View
reference-able so that it holds references to the underlyingSSTableIndex
s. When theview
is released the final time, it releases the indexes. This moves all of the complexity of grabbing references to sstable update time and out of the query path, which seems like a generally good improvement.SSTableIndex
holds a reference to its associated sstable reader.I am somewhat concerned about what will happen if indexes are missing, so I have some unanswered TODO comments in the PR.
Note also that we need to create the memtable index on deletions in order to make the index-first solution work.