Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(infraflow): report last error for wait functions #970

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

hown3d
Copy link
Contributor

@hown3d hown3d commented Jan 31, 2025

How to categorize this PR?

/area quality
/kind enhancement
/platform openstack

What this PR does / why we need it:
Adds the last error seen for Wait functions to the returned error if a context is done.

If a user adds servers into the network created by the extension and attaches a floating IP to it, openstack will report with an 409 error which is explicitly filtered out in the functions. If a timeout occurs and the context is canceled, infraflow will only report context canceled in the infrastructure status.lastError field. Therefore is hard to understand why the task failed in the first place.

Example of the new error format:

status:
  lastError:
    description: "Error deleting Infrastructure: 1 error occurred:\n\t* failed to
      \"delete router interface\": context deadline exceeded, last error: Expected
      HTTP response code [200] when accessing [PUT https://XXXX:443/v2.0/routers/1b3a04d1-49b1-494f-b7e1-edf23f86ad8b/remove_router_interface],
      but got 409 instead\n{\"NeutronError\": {\"type\": \"RouterInterfaceInUseByFloatingIP\",
      \"message\": \"Router interface for subnet XXXX
      on router XXXX cannot be deleted, as it is required
      by one or more floating IPs.\", \"detail\": \"\"}}\n\n"

Special notes for your reviewer:
Replace time.Sleep functions with a time.Ticker and use select.

Release note:

infraflow: report last error on task timeouts

@hown3d hown3d requested review from a team as code owners January 31, 2025 11:19
@gardener-robot gardener-robot added needs/review Needs review area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/enhancement Enhancement, improvement, extension platform/openstack OpenStack platform/infrastructure labels Jan 31, 2025
@gardener-robot
Copy link

@hown3d Thank you for your contribution.

@gardener-robot-ci-2
Copy link
Contributor

Thank you @hown3d for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below.

@gardener-robot gardener-robot added the size/s Size of pull request is small (see gardener-robot robot/bots/size.py) label Jan 31, 2025
kon-angelo
kon-angelo previously approved these changes Jan 31, 2025
Copy link
Contributor

@kon-angelo kon-angelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Jan 31, 2025
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 31, 2025
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 31, 2025
@hown3d hown3d force-pushed the last-error-context branch from 40318ac to 7e2bc83 Compare January 31, 2025 13:41
@gardener-robot gardener-robot added needs/review Needs review and removed needs/review Needs review labels Jan 31, 2025
@AndreasBurger AndreasBurger added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 31, 2025
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 31, 2025
@AndreasBurger
Copy link
Member

The current verify-error was due to the commit on master this PR was based on, should be good after a rebase.

While you're at it/in case you're up for it, the release note currently is not right (\ added for escaping):

\```other operator
bugfix operator
infraflow: report last error on task timeouts
\```

the inner bugfix operator should be where other operator is currently.

Signed-off-by: Lukas Hoehl <lukas.hoehl@stackit.cloud>
@hown3d hown3d force-pushed the last-error-context branch from 7e2bc83 to bc0fea0 Compare January 31, 2025 17:00
@AndreasBurger AndreasBurger added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 7, 2025
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 7, 2025
Comment on lines +269 to +274
if _, ok := err.(gophercloud.ErrDefault409); !ok {
return err
}
}
if _, ok := err.(gophercloud.ErrDefault409); !ok {
return err
if err == nil {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look better:

				if _, ok := err.(gophercloud.ErrDefault409); !ok {
					return err
				}
                continue
            }
            return nil

It's not entirely on this PR for the style change but it looks more readable in my opinion

Copy link
Contributor

@kon-angelo kon-angelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kon-angelo kon-angelo merged commit 711f568 into gardener:master Feb 17, 2025
10 checks passed
@kon-angelo
Copy link
Contributor

/test

@testmachinery
Copy link

testmachinery bot commented Feb 17, 2025

Testrun: e2e-rms84
Workflow: e2e-rms84-wf
Phase: Succeeded

+---------------------+-----------------------------+-----------+----------+
|        NAME         |            STEP             |   PHASE   | DURATION |
+---------------------+-----------------------------+-----------+----------+
| infrastructure-test | infrastructure-test         | Succeeded | 11m39s   |
| infrastructure-test | infrastructure-test-flow    | Succeeded | 11m8s    |
| infrastructure-test | infrastructure-test-migrate | Succeeded | 11m20s   |
| infrastructure-test | infrastructure-test-recover | Succeeded | 13m20s   |
| bastion-test        | bastion-test                | Succeeded | 10m45s   |
+---------------------+-----------------------------+-----------+----------+

@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/enhancement Enhancement, improvement, extension needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) platform/openstack OpenStack platform/infrastructure reviewed/lgtm Has approval for merging size/s Size of pull request is small (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants