-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-47201: Fix duplicates in non-find-first dataset search #1151
base: main
Are you sure you want to change the base?
Conversation
4427829
to
f98563f
Compare
@TallJimbo I'm not quite done with this, but could you take a quick look and let me know if this seems like I'm solving the right problem? It seems excessively complicated to me but I haven't thought of a better way to handle this yet -- I'll take a closer look in the morning with fresh eyes. (I will also be adding more tests to this -- we don't have sufficient coverage around non-trivial non-find-first queries.) |
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #1151 +/- ##
=======================================
Coverage 89.36% 89.36%
=======================================
Files 367 367
Lines 49540 49553 +13
Branches 6016 6019 +3
=======================================
+ Hits 44269 44281 +12
- Misses 3852 3853 +1
Partials 1419 1419 ☔ View full report in Codecov by Sentry. |
By the way, I had mentioned that dataset fields were missing from the joins stage... it turned out to not be relevant because
|
Fix an issue where duplicate results could appear in a non-find-first dataset search, if the same dataset appeared in multiple collections in a chain. This was occurring because we were forcing the addition of the collection key field to make the rows distinct. But on a non-find-first search, we don't have the window function to de-duplicate the rows by dataset ID, so we need to keep the collection key out of the rows and treat dataset ID as a unique key instead.
f98563f
to
bbfa0cf
Compare
Fix an issue where duplicate results could appear in a non-find-first dataset search, if the same dataset appeared in multiple collections in a chain.
This was occurring because we were forcing the addition of the collection key field to make the rows distinct. But on a non-find-first search, we don't have the window function to de-duplicate the rows by dataset ID. So we need to:
Checklist
doc/changes
configs/old_dimensions