Skip to content

Commit

Permalink
ref(grouping): Fix and refactor save_aggregate test (#78528)
Browse files Browse the repository at this point in the history
In an upcoming PR, `_save_aggregate_new` is going to get absorbed into its caller, `assign_event_to_group`. While working on that PR, one of the things I did was to rename (and move) the `test_save_aggregate.py` test module (mostly because `save_aggregate_new` is going away, but also to better reflect what it's actually testing, which is the locking behavior around new group creation).

It seemed a fairly benign change, but alas, suddenly a whole bunch of totally unrelated tests starting failing in CI (but only in CI - not locally). It turns out the problem was that the move/rename caused the `save_aggregate` tests to run in a different shard, which revealed the fact that the tests aren't actually thread-safe. (Indeed, they never have been, but until my change they hadn't been running alongside any tests which were sensitive to that.) More specifically, the problem is that the mocking done in the tests is done in each thread individually, rather than before the threads are split off. As far as I understand, this allows the mocking to interact with threads running other tests, thereby breaking them.

The fix, therefore, was to move the mocking out to the main level of the test function. As a bonus, this allowed for some further simplification: Because mocking isn't thread-safe, the code run in the threads needed a `try-finally`, because sometimes the mocking just didn't happen, leading to errors. This also meant that we had to manually close the transaction, because said errors would prevent it from closing automatically. With the threading fix, we now don't need either of those things.

So this PR makes those changes, and also pulls in a few refactors which had originally been in that PR. Mostly they're cosmetic - moving things around, clarifying some comments, etc. The only substantive change pulled from that PR was switching from testing whether or not things are working (`is_race_free`) to whether or not they aren't (`lock_disabled`).
  • Loading branch information
lobsterkatie authored Oct 2, 2024
1 parent dc3fc08 commit 1141329
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 103 deletions.
98 changes: 98 additions & 0 deletions tests/sentry/event_manager/grouping/test_group_creation_lock.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import contextlib
import time
from threading import Thread
from unittest.mock import patch

import pytest

from sentry.event_manager import GroupInfo, _save_aggregate_new
from sentry.eventstore.models import Event
from sentry.testutils.pytest.fixtures import django_db_all

CONCURRENCY = 2


class FakeTransactionModule:
@staticmethod
@contextlib.contextmanager
def atomic(*args, **kwds):
yield


def save_event(project_id: int, return_values: list[GroupInfo]) -> None:
event = Event(
project_id,
"11212012123120120415201309082013",
data={"timestamp": time.time()},
)

group_info = _save_aggregate_new(
event=event,
job={"event_metadata": {}, "release": "dogpark", "event": event, "data": {}},
metric_tags={},
)

assert group_info is not None
return_values.append(group_info)


@django_db_all(transaction=True)
@pytest.mark.parametrize(
"lock_disabled",
[
# Group creation with transaction isolation (which is what powers the lock) disabled, to
# show that without it, multiple groups are created when there's a race condition while
# ingesting events with the same data. This variant exists so that we can ensure the test
# would detect a malfunctioning lock in principle, and does not just always pass because of
# low parallelism. In a sense this variant tests the efficacy of this test, not actual
# business logic.
#
# If this variant fails, CONCURRENCY needs to be increased or e.g. thread barriers need to
# be used to ensure data races. This does not seem to be necessary so far.
True,
# Regular group creation, in which the lock should be working
False,
],
ids=(" lock_disabled: True ", " lock_disabled: False "),
)
def test_group_creation_race(monkeypatch, default_project, lock_disabled):
if lock_disabled:
# Disable transaction isolation just within event manager, but not in
# GroupHash.objects.create_or_update
monkeypatch.setattr("sentry.event_manager.transaction", FakeTransactionModule)

# `select_for_update` cannot be used outside of transactions
monkeypatch.setattr("django.db.models.QuerySet.select_for_update", lambda self: self)

with (
patch(
"sentry.grouping.ingest.hashing._calculate_event_grouping",
return_value=["pound sign", "octothorpe"],
),
patch(
"sentry.event_manager._get_group_processing_kwargs",
return_value={"level": 10, "culprit": "", "data": {}},
),
patch("sentry.event_manager._materialize_metadata_many"),
):
return_values: list[GroupInfo] = []
threads = []

# Save the same event data in multiple threads. If the lock is working, only one new group
# should be created
for _ in range(CONCURRENCY):
thread = Thread(target=save_event, args=[default_project.id, return_values])
thread.start()
threads.append(thread)

for thread in threads:
thread.join()

if not lock_disabled:
# assert only one new group was created
assert len({group_info.group.id for group_info in return_values}) == 1
assert sum(group_info.is_new for group_info in return_values) == 1
else:
# assert multiple new groups were created
assert 1 < len({group_info.group.id for group_info in return_values}) <= CONCURRENCY
assert 1 < sum(group_info.is_new for group_info in return_values) <= CONCURRENCY
103 changes: 0 additions & 103 deletions tests/sentry/event_manager/test_save_aggregate.py

This file was deleted.

0 comments on commit 1141329

Please sign in to comment.