Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat][streaming] - Added more retry codes to websocket retry logic #42218

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

ShourieG
Copy link
Contributor

@ShourieG ShourieG commented Jan 6, 2025

Type of change

  • Bug

Proposed commit message

Some valid websocket retry codes were missing from the initial implementation, which could lead to retries not occurring in some scenarios, hence added some more codes to retry on. There is a table below that shows what codes are currently retryable and why.

Table Of Reasons

+---------------------------------------+---------------------------------------------------------+---------+
| Close Code                            | Description                                             | Retry?  |
+---------------------------------------+---------------------------------------------------------+---------+
| 1000 - CloseNormalClosure             | Connection closed normally.(can be a false positive)    | Yes     |
| 1001 - CloseGoingAway                 | Endpoint is going away (e.g., server shutdown).         | Yes     |
| 1002 - CloseProtocolError             | Protocol error (e.g., malformed frames).                | No      |
| 1003 - CloseUnsupportedData           | Unsupported data type received.                         | No      |
| 1005 - CloseNoStatusReceived          | No status code provided (abnormal closure).             | Yes     |
| 1006 - CloseAbnormalClosure           | Abnormal connection closure (unexpected).               | Yes     |
| 1007 - CloseInvalidFramePayloadData   | Invalid payload data (e.g., bad UTF-8).                 | No      |
| 1008 - ClosePolicyViolation           | Policy violation (e.g., authentication issues).         | No      |
| 1009 - CloseMessageTooBig             | Received message exceeds allowed size.                  | Yes     |
| 1010 - CloseMandatoryExtension        | Required extensions not supported.                      | No      |
| 1011 - CloseInternalServerErr         | Server encountered an unexpected error.                 | Yes     |
| 1012 - CloseServiceRestart            | Server is restarting.                                   | Yes     |
| 1013 - CloseTryAgainLater             | Server is overloaded; try again later.                  | Yes     |
| 1015 - CloseTLSHandshake              | TLS handshake failure (e.g., certificate issues).       | Yes     |
+---------------------------------------+---------------------------------------------------------+---------+

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@ShourieG ShourieG requested a review from a team as a code owner January 6, 2025 11:52
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 6, 2025
Copy link
Contributor

mergify bot commented Jan 6, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @ShourieG? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Jan 6, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 6, 2025
@ShourieG ShourieG added backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify Team:Security-Service Integrations Security Service Integrations Team labels Jan 6, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 6, 2025
@ShourieG ShourieG added needs_team Indicates that the issue/PR needs a Team:* label bugfix labels Jan 6, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 6, 2025
@botelastic
Copy link

botelastic bot commented Jan 6, 2025

This pull request doesn't have a Team:<team> label.

Copy link
Contributor

@kcreddy kcreddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼 Just need a clarification.

In CloseMessageTooBig case, is the message responsible for the error discarded or does the retry makes attempt to read same message with increased size limit?

@ShourieG
Copy link
Contributor Author

ShourieG commented Jan 7, 2025

LGTM 👍🏼 Just need a clarification.

In CloseMessageTooBig case, is the message responsible for the error discarded or does the retry makes attempt to read same message with increased size limit?

@kcreddy, the gorilla websocket does not impose any limits on the frame size, these limits are only imposed by the websocket protocol so this can't be increased. Since the messages are streamed they are sent in individual frames. There is no such hard limit to the message size.
So for this error it's generally a protocol violation from the server or something weird that's going on and could be resolved via retries

@ShourieG ShourieG merged commit ac57409 into elastic:main Jan 7, 2025
22 checks passed
mergify bot pushed a commit that referenced this pull request Jan 7, 2025
mergify bot pushed a commit that referenced this pull request Jan 7, 2025
mergify bot pushed a commit that referenced this pull request Jan 7, 2025
@ShourieG ShourieG deleted the websocket/more_retry_scenarios branch January 7, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify bugfix input:streaming Team:Security-Service Integrations Security Service Integrations Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants