Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect timeout metrics for call attempts #3240

Open
2 tasks
ebautistabar opened this issue Jun 9, 2022 · 5 comments
Open
2 tasks

Collect timeout metrics for call attempts #3240

ebautistabar opened this issue Jun 9, 2022 · 5 comments
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue sdk-metrics

Comments

@ebautistabar
Copy link

Describe the feature

I propose that the SDK collects timeout metrics for call attempts.

Current docs for available metrics, as reference:
https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/metrics-list.html

Use Case

The SDK allows users to tweak timeout values. Users can utilize the existing latency metrics to inform their decision about appropriate timeout values in the SDK, but it would be very helpful to have dedicated timeout metrics to see the direct effect of any changes.

Proposed Solution

No response

Other Information

I tried using an ExecutionInterceptor, which offers hooks into different points in the request/response lifecycle, with the intention of creating the timeout metric myself.

It seemed promising but unfortunately it couldn't make it work:

  • response hooks don't run because the timeout is supposed to trigger before the response is available
  • exception hooks only run when the request as a whole fails, giving you the final failure that is thrown, but doesn't run for each underlying call attempt, and also doesn't keep track of all failures that caused each retry

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS Java SDK version used

v2

JDK version used

Corretto 11

Operating System and version

Amazon Linux 2

@ebautistabar ebautistabar added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jun 9, 2022
@debora-ito
Copy link
Member

@ebautistabar thank you for reaching out, we've added this to our backlog.

@debora-ito debora-ito added sdk-metrics and removed needs-triage This issue or PR still needs to be triaged. labels Jun 13, 2022
@debora-ito
Copy link
Member

debora-ito commented Jun 15, 2022

@ebautistabar the team have some follow-up questions:

  • the metric you want to publish is the timeout value? It was not very clear which metric you're talking about.
  • does it work if you use ExecutionInterceptor to publish the value with every request, instead of only publishing when the timeout is triggered or when an exception is thrown?

@debora-ito debora-ito added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jun 15, 2022
@ebautistabar
Copy link
Author

ebautistabar commented Jun 15, 2022

@debora-ito

Apologies, I didn't explain myself clearly. I do not want the timeout value.

I will explain by following the same format as the tables in https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/metrics-list.html

Metrics collected for each request attempt

  • Metric name: Timeout
  • Description: True if this API call attempt timed out; false if not
  • Type: Boolean
  • Collected by default?: Yes

Now let's go over an example.

Let's say I make a call to DynamoDB which results in three call attempts:

  • The first call attempt fails with a 500
  • The second call attempt fails with a timeout
  • The third call attempt succeeds

The new Timeout metric would have the following values for each call attempt:

  • In the first call attempt, Timeout=false
  • In the second call attempt, Timeout=true
  • In the third call attempt, Timeout=false

This new metric will allow customers to easily graph the amount of timeouts in their dashboards.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Jun 15, 2022
@debora-ito
Copy link
Member

Got it now, thanks for clarifying!

In that case we don't have plans to add that specific metric to the default set of SDK client-side metrics. We should make it possible for you to publish the metric for yourself using ExecutionInterceptors, though, as it looks like it's not possible right now. We have a feature request in the backlog that might fulfill your use case - #929.

@ebautistabar
Copy link
Author

ebautistabar commented Jun 18, 2022

@debora-ito

Thanks for your response.

Just to check if I understand correctly: a new method will be added to the ExecutionInterceptor interface with a signature similar to this

void onExecutionAttemptFailure(Context.FailedExecutionAttempt context,
                               ExecutionAttributes executionAttributes)
  • The method will be invoked for each request attempt failure
  • context will contain the exception for the failure of the request attempt
  • executionAttributes will contain some kind of Metrics object into which clients will be able to add their own metrics

Is my interpretation correct? If so, I'm aligned with the approach, and I'm fine with tracking progress on #929

I think that adding a new hook for request attempt failures is a good idea regardless, brings more value and has broader appeal than just adding a new metric.

Follow up questions:

  • do you document somewhere what types of exception are thrown when there are timeouts?
  • ExecutionInterceptor access to request attempt exceptions #929 was opened in 2018 and there doesn't seem to be any progress; is there any ETA for when the task will be picked up? Anything I can do to help prioritize it or get it done faster?

Thanks in advance.

@yasminetalby yasminetalby added the p3 This is a minor priority issue label Nov 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue sdk-metrics
Projects
None yet
Development

No branches or pull requests

3 participants