-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MM-62075: Ignore error from aws s3 rm command #864
Conversation
This command is making the whole {comparison, deployment} destroy to fail when there is an S3 bucket deployed. That error is valid in the local context of the aws s3 rm command, but not in the global sense: since we run the aws s3 rm command concurrently with terraform destroy, aws s3 rm fails when the bucket is destroyed before the command has finished. This is a spurious failure: all resources, including the S3 bucket, have been destroyed (that's exactly why the aws s3 rm command fails, actually), so we can simply skip it. In the case this command would error in any other way that we are not accounting for as of today, the {comparison, deployment} destroy command would still fail, since it would notice that the S3 bucket has not been destroyed.
Thanks for taking a look, @toninis! Let me answer your comment here:
The problem here was that the whole thing was timing out because it was incredibly slow. I don't like the solution of using a concurrent That's why I'm proposing getting this PR merged. If this one gets accepted, I'll open a ticket to test the |
// since it introduces spurious failures when the bucket is | ||
// destroyed before the command finishes. | ||
// See https://mattermost.atlassian.net/browse/MM-62075 | ||
_ = t.runAWSCommand(emptyBucketCtx, emptyS3BucketArgs, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible at all to know what error is returned when the bucket is destroyed before the command finishes, and ignore that exact error rather than all errors? If that is too complicated, this is fine as well. Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could manually grep the returned error, looking for something like failed to empty
. I don't know if that same string would show up in other errors, though. That would need some more testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually ran into this today. It seems like the error is NoSuchBucket:
info [2024-12-06 14:26:13.837 +05:30] delete failed: s3://agnivalocallt.s3bucket/users/baw7cx384i8ppdka1h1zj3ntdr/profile.png An error occurred (NoSuchBucket) when calling the DeleteObject operation: The specified bucket does not exist caller="terraform/engine.go:71"
info [2024-12-06 14:26:13.851 +05:30] fatal error: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist caller="terraform/engine.go:71"
Oh, it turns out we already have
aws s3 rm fix was added. cc/@toninis
|
That answers my question . Because the rm command should not fail with file does not exist unless another delete operation is in place. Perfect 🆗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed summary!
Summary
When destroying a cluster that deploys an S3 bucket, we try to speed up its deletion by calling
aws s3 rm s3://my-bucket
concurrently while we runterraform destroy
. This was added to fix https://mattermost.atlassian.net/browse/MM-47263, since we were seeing too many timeouts due to the S3 bucket not being properly destroyed in time. Emptying the bucket while other resources are being destroyed has worked to fix the timeout issue, but it introduces a new error: if the bucket is destroyed by Terraform before theaws s3 rm
command finishes, this command, and thus the whole{comparison,deployment} destroy
command, fails.This is a spurious failure: all resources, including the S3 bucket, have been destroyed (that's exactly why the
aws s3 rm
command fails, actually), so we can simply ignore it.In the case this command would error in any other way that we are not accounting for as of today, the
{comparison,deployment} destroy
command would still fail, since it would notice that the S3 bucket has not been destroyed.Ticket Link
https://mattermost.atlassian.net/browse/MM-62075