Skip to content

Commit

Permalink
ref(grouping): Only calculate secondary hash when needed (#65275)
Browse files Browse the repository at this point in the history
After a _great_ deal of refactoring and breaking apart the logic of how we assign events to groups, this PR finally makes the desired change: to stop calculating secondary hashes when we already have a matching primary hash. 

Metrics to measure the effect of this change will be added in upcoming PRs, but one sign of the success of this change is the adjustment necessary in the tests. Whereas before, being in a transition period always meant doing two calculations, now when the new logic is enabled, being in a transition is no different than not being in a transition  - at least in the case where the current hash has already been calculated. And our research has shown that this is case the vast majority of the time, so hopefully this should make a big impact when we change configs and make it much less expensive to do so.
  • Loading branch information
lobsterkatie authored Feb 22, 2024
1 parent d268f2e commit bf85769
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 10 deletions.
26 changes: 17 additions & 9 deletions src/sentry/event_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -1642,24 +1642,32 @@ def _save_aggregate_new(
metric_tags: MutableTags,
) -> GroupInfo | None:
project = event.project
secondary = GroupHashInfo(None, None, [], None)

group_processing_kwargs = _get_group_processing_kwargs(job)

# Try looking for an existing group using the current grouping config
primary = create_and_seek_grouphashes(job, run_primary_grouping, metric_tags)
secondary = create_and_seek_grouphashes(job, maybe_run_secondary_grouping, metric_tags)

all_grouphashes = primary.grouphashes + secondary.grouphashes

# If we've found one, great. No need to do any more calculations
if primary.existing_grouphash:
group_info = handle_existing_grouphash(
job, primary.existing_grouphash, all_grouphashes, group_processing_kwargs
)
elif secondary.existing_grouphash:
group_info = handle_existing_grouphash(
job, secondary.existing_grouphash, all_grouphashes, group_processing_kwargs
job, primary.existing_grouphash, primary.grouphashes, group_processing_kwargs
)
# If we haven't, try again using the secondary config
else:
group_info = create_group_with_grouphashes(job, all_grouphashes, group_processing_kwargs)
secondary = create_and_seek_grouphashes(job, maybe_run_secondary_grouping, metric_tags)
all_grouphashes = primary.grouphashes + secondary.grouphashes

# Now we know for sure whether or not a group already exists, so handle both cases
if secondary.existing_grouphash:
group_info = handle_existing_grouphash(
job, secondary.existing_grouphash, all_grouphashes, group_processing_kwargs
)
else:
group_info = create_group_with_grouphashes(
job, all_grouphashes, group_processing_kwargs
)

# From here on out, we're just doing housekeeping

Expand Down
5 changes: 4 additions & 1 deletion tests/sentry/event_manager/grouping/test_assign_to_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ def test_existing_group_new_hash_exists(
new_logic_enabled=new_logic_enabled,
)

if in_transition:
if in_transition and not new_logic_enabled:
assert results == {
"primary_hash_calculated": True,
"secondary_hash_calculated": True,
Expand All @@ -437,6 +437,9 @@ def test_existing_group_new_hash_exists(
"primary_grouphash_exists_now": True,
"secondary_grouphash_exists_now": True,
}
# Equivalent to `elif (in_transition and new_logic_enabled) or not in_transition`. In other
# words, with the new logic, if the new hash exists, it doesn't matter whether we're in
# transition or not - no extra calculations are performed.
else:
assert results == {
"primary_hash_calculated": True,
Expand Down

0 comments on commit bf85769

Please sign in to comment.