Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log job retries as warnings instead of errors #743

Merged
merged 3 commits into from
Feb 8, 2025

Conversation

begelundmuller
Copy link
Contributor

First of all, thank you for building River, it's a wonderful library!

This PR changes the error reporting to log retryable errors at warn level, reserving the error level for jobs that fail after reaching MaxAttempts.

This is desirable for my use case because I consider error-level logs as requiring human inspection/intervention, which is only needed for jobs that fail even after retrying. However I realize this may not be a practice everyone shares, so please let me know if you would prefer another approach here.

@bgentry
Copy link
Contributor

bgentry commented Feb 4, 2025

I think I'm in favor of this change. These are cases where the system is operating as expected and nothing is wrong per se, it's just that the job itself had an error (which may be totally normal depending on the job). Bundling that together with actual River errors results in a lot of noise and makes it harder to distinguish actual issues.

Thoughts, @brandur ?

@brandur
Copy link
Contributor

brandur commented Feb 5, 2025

Yeah, works for me. It does seem to me to be a little on the odd side that these errors are warnings. Not to say they should be info-level per se either, but you could make a similar argument about warnings as you could about errors — i.e. that warnings should be somewhat actionable as well because they represent something that shouldn't be happening.

There's a test failure in one of the example tests. Can you take a look at that? Also, can you add a changelog? Thx.

@bgentry
Copy link
Contributor

bgentry commented Feb 6, 2025

Yeah, works for me. It does seem to me to be a little on the odd side that these errors are warnings. Not to say they should be info-level per se either, but you could make a similar argument about warnings as you could about errors — i.e. that warnings should be somewhat actionable as well because they represent something that shouldn't be happening.

Yeah, this is a tough balance. With job errors in particular, users have a great option to error handlers to do whatever logging they want, whereas aside from adjusting the log level they can't control what River will output. For that reason I think we can safely exclude regular errors, potentially even discarded jobs if we want to, and keep the noise down for "expected" errors from the library's perspective.

@begelundmuller
Copy link
Contributor Author

There's a test failure in one of the example tests. Can you take a look at that? Also, can you add a changelog? Thx.

Done!

@brandur
Copy link
Contributor

brandur commented Feb 8, 2025

Yeah, this is a tough balance. With job errors in particular, users have a great option to error handlers to do whatever logging they want, whereas aside from adjusting the log level they can't control what River will output. For that reason I think we can safely exclude regular errors, potentially even discarded jobs if we want to, and keep the noise down for "expected" errors from the library's perspective.

@bgentry Yep agreed.

Done!

@begelundmuller Excellent. Thanks!

@brandur brandur merged commit b2737cf into riverqueue:master Feb 8, 2025
10 checks passed
@bgentry
Copy link
Contributor

bgentry commented Feb 9, 2025

@begelundmuller thank you for this! Would you also be able to add yourself to the CLA when you have a moment? https://github.com/riverqueue/rivercla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants