Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faet(sentry apps): Add context manager for sentry apps and impl for event webhooks #86136

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Christinarlong
Copy link
Contributor

@Christinarlong Christinarlong commented Feb 28, 2025

Per https://www.notion.so/sentry/Sentry-App-SLOs-19f8b10e4b5d805dacb9f80beccd1c65?pvs=4#1a68b10e4b5d807a80fbde13daf0ff7b

We're separating out the preparation and sending of webhooks.

Preparation

  • process_resource_change_bound

Sending

  • send_webhooks
  • send_resource_change_webhook
  • send_and_save_webhook_request

With the current implementation we will receive the following metrics:

  • sentry_app.prepare_webhook.{outcome} w/ event_type: "process_resource_change.{sender}_{action}"

    • The outcome of this metric is independent of the sending tasks
    • sender would be Error, Group
    • action would be "created"
  • sentry_app.send_webhook.{outcome} w/ event_type : issue.created or error.created - potentially recorded 2 times (is this bad?)

    • We declare the context manager in send_resource_change_webhook and send_and_save_webhook_request .
    • Having the context manager be declared twice could lead to some ambiguity on where something failed (?) But I don't think it's that major since the logic in send_resource_change_webhook & send_webhooks is p. minimal (also we have the logs to pinpoint failure place).

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Feb 28, 2025
logger.info("process_resource_change.event_missing_event", extra=extra)
with SentryAppInteractionEvent(
operation_type=SentryAppInteractionType.PREPARE_WEBHOOK,
event_type=f"process_resource_change.{sender}_{action}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont have an 'event' string at this point so making something up

lifecycle.record_failure(e)
return None
except (ApiHostError, ApiTimeoutError, RequestException, ClientError) as e:
lifecycle.record_halt(e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApiHostError & ApiTimeoutError are raised from send_and_save_webhook_request when the response is 503 or 504.

RequestExceptions occur when we get a Timeout or ConnectionError, these I also woudn't consider our server being at fault.

ClientErrors are anything <500 and also wouldn't be our fault, because it's the 3p responsibility to properly consume our requests.

We re raise since the TASK_OPTIONS already have the retry logic specified (i.e which errors to retry or ignore)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI that we will not encounter this block for NOT published apps (e.g internal & unpublished). As we only propagate up errors from send_and_save_webhook_request for published apps.

Copy link

codecov bot commented Feb 28, 2025

Codecov Report

Attention: Patch coverage is 94.82759% with 12 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/sentry_apps/tasks/sentry_apps.py 89.65% 6 Missing ⚠️
src/sentry/utils/sentry_apps/webhooks.py 82.85% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #86136       +/-   ##
===========================================
+ Coverage   45.90%   87.92%   +42.01%     
===========================================
  Files        9709     9721       +12     
  Lines      551059   551742      +683     
  Branches    21527    21401      -126     
===========================================
+ Hits       252943   485095   +232152     
+ Misses     297730    66263   -231467     
+ Partials      386      384        -2     

Comment on lines +103 to +104
MockResponseInstance = MockResponse({}, b"{}", "", True, 200, raiseStatusFalse, None)
MockResponse404 = MockResponse({}, b'{"bruh": "bruhhhhhhh"}', "", False, 404, raiseException, None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to be bytes for json.loads()

@Christinarlong Christinarlong marked this pull request as ready for review March 4, 2025 18:43
@Christinarlong Christinarlong requested review from a team as code owners March 4, 2025 18:43
lifecycle.record_failure(
failure_reason="process_resource_change.event_missing_event", extra=extra
)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be raised as exceptions and caught inside process_resource_change_bound? then they would automatically be recorded as failures, and you can pick which exceptions to retry the task with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♀️ ehhhh, they could ? I think the benefit with the current code is it's clear if and why we're recording X outcome or retrying the task. Having process_resource_change_bound do exception handling makes my life easier since I can just raise an error but I think this way's clearer so ye

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make your life easier c:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr i think it's better to manually record halt than failure

@@ -110,3 +112,12 @@ def assert_middleware_metrics(middleware_calls):
assert end1.args[0] == EventLifecycleOutcome.SUCCESS
assert start2.args[0] == EventLifecycleOutcome.STARTED
assert end2.args[0] == EventLifecycleOutcome.SUCCESS


def assert_count_of_metric(mock_record, outcome, outcome_count):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for asserting failure, it might help to be more specific by checking the message that the lifecycle is terminating with like here

def assert_failure_metric(mock_record, error_msg):
(event_failures,) = (
call for call in mock_record.mock_calls if call.args[0] == EventLifecycleOutcome.FAILURE
)
if isinstance(error_msg, Exception):
assert isinstance(event_failures.args[1], type(error_msg))
else:
assert event_failures.args[1] == error_msg

currently when you assert a single failure metric, you're only asserting on the count and not the exact line of code, which could be done if you check which exception or message it failed with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do both, currently in the tests I do assert_failure_metric alongside assert_count_of_metric for most of the tests, but yeah I can make that consistent and do both for all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants