Skip to content

CNDB-13693: Make SAI's view referenceable to simplify locking query view #1700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Apr 17, 2025

What is the issue

Fixes: https://github.com/riptano/cndb/issues/13693

What does this PR fix and why was it fixed

The new design is to:

  1. Make sai View reference-able so that it holds references to the underlying SSTableIndexs. When the view is released the final time, it releases the indexes. This moves all of the complexity of grabbing references to sstable update time and out of the query path, which seems like a generally good improvement.
  2. Observe that SSTableIndex holds a reference to its associated sstable reader.

I am somewhat concerned about what will happen if indexes are missing, so I have some unanswered TODO comments in the PR.

Note also that we need to create the memtable index on deletions in order to make the index-first solution work.

Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@jasonstack
Copy link

jasonstack commented Apr 21, 2025

I will review it tmr.


Update: I won't be able to review it on Tuesday. Will do it within this week.

if (matcher.apply(cv))
{
// We can exit now because we won't find a better candidate
var candidate = new PqInfo(searcher.getPQ(), searcher.containsUnitVectors(), segment.metadata.numRows);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: make sure the PQ is safe to use after releasing the view.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding a SSTableIndex in PqInfo and release it when PqInfo is done

@@ -194,10 +194,12 @@ public void testIndexRebuildWhenAddingSStableViaRemoteReload()
assertEmpty(execute("SELECT * FROM %s WHERE a=1"));
assertEmpty(execute("SELECT * FROM %s WHERE c=1"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test testIndexRebuildWhenAddingSStableViaRemoteReload was added to test legacy writer behavior that built index on writer. Now writer only builds index on flush, not reload.

During sstable reload, reloaded sstable should have index attached (compaction task would abort if failed to build index; if initial index build is not completed, index is not-queryable, it's fine); unless it's added as shallow sstable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test can be removed if you prefer


var sstableReaders = new ArrayList<SSTableReader>(saiView.size());
// These are already referenced because they are referenced by the same view we just referenced.
// TODO review saiView.match() method for boolean predicates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an issue about boolean predicate support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create it. The point is that we have a data structure in the view to efficiently get the indexes that might satisfy the predicate. Without using match, we're just searching all indexes.

// of nanoceconds, but the timeout is large enough just in case of unpredictable performance hiccups.
outer:
while (!MonotonicClock.approxTime.isAfter(start + TimeUnit.MILLISECONDS.toNanos(2000)))
// Get memtables first in case we are in the middle of flushing one.
Copy link

@jasonstack jasonstack Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, the only case SAI might return partial data would be with shallow sstable in step3.

1. during reload after compaction, it timed out downloading archive for newly created sstable and added shallow sstable into tracker
2. `RemoteStorageHandler` unloads compacted sstables from tracker.
3. SAI query is being processed
4. when sstable reload ended, it marked the index as non-queryable

We either add corresponding dummy SSTableContext/SSTableIndex for shallow sstable and fail the query (depending on CNDB SSTableReloadFailureMode) if the query uses the dummy index.

Or we double-check in CFS#tracker if there is shallow sstable and decide whether to abort/proceed query based on CNDB SSTableReloadFailureMode``

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the unload and reload ending steps distinct? Is there any way to update that logic so that we don't have this race? At the moment, the CFS#tracker doesn't really expose the shallow sstable issue because the class is out of the CC scope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a design perspective, I think it would make the most sense to solve this in CNDB since it is a purely cndb construct.

Copy link

@jasonstack jasonstack Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the unload and reload ending steps distinct?

RemoteStorageHandler processes sstables in batch of 100. Let me revise my example.

Assuming that adding shallow sstable and obsoleting compacted sstable are applied in the same transaction.

  1. Before reloading, CFS has 1 sstable. Now compactor rewrites the sstable into another one.
  2. During reload after compaction, it timed out downloading archive for newly created sstable and uses shallow sstable. At the same time, it unloads compacted sstables from tracker.
  3. CFS#tracker now contains only 1 shallow sstable, but SSTableIndex still references the old sstable.
  4. With current patch, SAI query is being processed and completed properly.
  5. When SAIGroup receives sstable changed notification, it won't find index files for the shallow sstable and release index for compacted sstable. After SSTableContextManager#update and IndexViewManager#updae, View would be empty.
  6. SAI query is being processed and responds with partial data.
  7. CNDB side marks the index as non-queryable at the end of reloading
  8. SAI query is now blocked as expected.

Would be nice to have integration test for above case. I think the culprit is in step 4. We should mark index as non-queryable there if it's trying to add a non-preexisting sstable (e.g. shallow) without proper index to prevent responding partial data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that explanation makes sense. If it's possible in step 1 to also mark the index as non-queryable, that seems like the most straight forward solution. Any chance you are available to help add that integration test? I think this patch will really help clusters with active SAI queries happening during compaction.

Copy link

@jasonstack jasonstack Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased the previous reproduction with this patch. It has reproduced the behavior of returning partial data with shallow sstable. https://github.com/riptano/cndb/compare/shallow_sstable_with_sai?expand=1

Copy link

@pkolaczk pkolaczk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice simplification and this avoids all the looping with matching the indexes to sstables. However I have a concern if we don't miss any data in a very narrow race window when flush happens.

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1700 rejected by Butler


3 new test failure(s) in 6 builds
See build details here


Found 3 new test failures

Test Explanation Branch history Upstream history
...gLegacyIndex.test_sstableloader_with_failing_2i regression 🔴🔴🔴🔴🔴 🔵🔵🔵🔵🔵🔵🔵
....NativeIndexDDLTest.verifyIndexWithDecommission regression 🔴🔵🔵🔵🔵🔵 🔵🔵🔵🔵🔵🔵🔵
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴🔵🔴🔵🔵🔴 🔵🔵🔵🔵🔵🔵🔵

Found 18 known test failures

Copy link

@jasonstack jasonstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment on potential race with shallow sstable, the rest LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants