-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/tcp: Only disable ep if the failure can not be retried #10779
Conversation
@shefty are you ok with the patch? We saw some failures on our site "ofi_bsock_async_done() Error reading MSG_ERRQUEUE (Resource temporarily unavailable)" If this means EAGAIN, then ep should not be disabled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the check into ofi_bsock_async_done() after recvmsg(). This should have been trapped there and the error discarded (no warning message).
ah, sure. thanks! @shefty |
src/common.c
Outdated
@@ -1421,6 +1421,9 @@ int ofi_bsock_async_done(const struct fi_provider *prov, | |||
msg.msg_controllen = sizeof(ctrl); | |||
ret = recvmsg(bsock->sock, &msg, MSG_ERRQUEUE); | |||
if (ret < 0) { | |||
if (OFI_SOCK_TRY_SND_RCV_AGAIN(ret)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a direct call to recvmsg, so you want to use errno
instead of ret. This ends up being linux specific code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yes. Thanks!
Final changes look okay, but please squash the commits |
Only disable ep if the failure can not be retried. Signed-off-by: Di Wang <ddiwang@google.com>
sure. thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good -- thanks
Only disable ep if the failure can not be retried.