Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests][integration] fix monitoring test cases #5208

Merged
merged 12 commits into from
Jul 30, 2024

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Jul 27, 2024

  • Bug

What does this PR do?

  • With introduction of status reporter for metricbeat, integration test cases were failing
  • From diagnostics, it was observed that the status transitioned from HEALTHY to DEGRADED (momentarily) and again to HEALTHY.
  • Previous behaviour:
    • During the above transition, runner.agentFixture.ExecStatus(ctx) returned an error because of the momentarily DEGRADED state and it caused entire test to fail (even though it returned to a HEALTHY state, its just how go tests function).
  • This PR does the following:
    • Remove the check on error inside require.Eventually.
      • If all the components return to a healthy state, the test would pass eventually.
      • If it remains in a unhealthy state, the test would fail.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

closes #5209

@VihasMakwana VihasMakwana added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jul 27, 2024
@VihasMakwana VihasMakwana self-assigned this Jul 27, 2024
Copy link
Contributor

mergify bot commented Jul 27, 2024

This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@VihasMakwana VihasMakwana marked this pull request as ready for review July 29, 2024 08:39
@VihasMakwana VihasMakwana requested a review from a team as a code owner July 29, 2024 08:39
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to change -

func (f *Fixture) ExecStatus(ctx context.Context, opts ...process.CmdOption) (AgentStatusOutput, error) {
instead of the actual test.

Reading the docstring on ExecStatus it says it can return an output even on exit of 1. Maybe we should only return an error when we are unable to read the output, otherwise as long as some output is returned it should not be an error.

That would probably fix more tests and not just these specific tests.

@cmacknz
Copy link
Member

cmacknz commented Jul 29, 2024

Agree with Blake, the ExecStatus should return nil if we were able to get the output.

It feels like it should probably return three values: the command output, a boolean for whether agent is healthy, and an error that is only not-nil if we couldn't get or parse the output of the command.

If we were to keep it the way it is, the actual bug is that there is a require.NoError inside of a require.Eventually loop to immediately fail the test. The require.Error in the case below should have been if err != nil { return false } to let the eventually retry.

require.NoError(runner.T(), err)

@VihasMakwana
Copy link
Contributor Author

@blakerouse @cmacknz Agree with you guys. Thanks for your inputs.

i've updated the ExecStatus and we return nil if we can parse the output.

@VihasMakwana VihasMakwana requested a review from blakerouse July 29, 2024 17:36
@VihasMakwana VihasMakwana requested a review from cmacknz July 29, 2024 20:25
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for changing this around.

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Jul 29, 2024

@blakerouse thanks for approving. So you know, I made a small change to combine uerr and err(just in case if it's not empty) a second after your approval 😅

if uerr := json.Unmarshal(out, &status); uerr != nil {
return AgentStatusOutput{},
fmt.Errorf("could not unmarshal agent status output: %w", errors.Join(uerr, err))
} else if status.IsZero() {

@pierrehilbert
Copy link
Contributor

Looks like we now have failing tests that may be related to your change.

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Jul 30, 2024

@blakerouse @cmacknz from the logs of failing tests, I can see that output is valid and not empty but status.IsZero() returns false because Info.ID field is not set somehow. How it looks in json:

{
    "info": {
        "id": "",
        "version": "8.16.0",
        "commit": "48bf9adae418cb043d20c18fe13cae1e6c5d05f9",
        "build_time": "2024-07-30 09:58:04 +0000 UTC",
        "snapshot": true,
        "pid": 19922,
        "unprivileged": true
    },
    "state": 2,
    "message": "Running",
    "components": [],
    "FleetState": 6,
    "FleetMessage": "Not enrolled into Fleet"
}
Screenshot 2024-07-30 at 4 36 45 PM

I think we should check also check for Version, Message, and ID.
So, something like:

func IsZero() bool {
     return status.Message == "" && status.Info.Version == "" && status.Info.ID == ""
}

Edit:

PTAL! I've updated the condtions.

remove logs as the test failure has been identified

This reverts commit dc392b3.
Copy link

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the fixes.

@@ -1243,7 +1237,7 @@ type AgentStatusOutput struct {
}

func (aso *AgentStatusOutput) IsZero() bool {
return aso.Info.ID == ""
return aso.Info.ID == "" && aso.Message == "" && aso.Info.Version == ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes more sense actually. As IsZero should be if nothing is outputted at all.

@VihasMakwana VihasMakwana merged commit 509b1cc into elastic:main Jul 30, 2024
13 checks passed
@VihasMakwana VihasMakwana added backport-8.15 Automated backport to the 8.15 branch with mergify and removed backport-skip labels Jul 30, 2024
mergify bot pushed a commit that referenced this pull request Jul 30, 2024
* chore: fix monitoring test cases

* fix: update long running tests

* fix: change ExecStatus and return nil

* fix: update docstring, non-empty

* fix: reword

* fix: nit

* fix: empty line

* fix: join errors.

* temproray logs to examine CI

* Revert "temproray logs to examine CI"

remove logs as the test failure has been identified

This reverts commit dc392b3.

* fix: update condition

(cherry picked from commit 509b1cc)
pierrehilbert pushed a commit that referenced this pull request Jul 31, 2024
* chore: fix monitoring test cases

* fix: update long running tests

* fix: change ExecStatus and return nil

* fix: update docstring, non-empty

* fix: reword

* fix: nit

* fix: empty line

* fix: join errors.

* temproray logs to examine CI

* Revert "temproray logs to examine CI"

remove logs as the test failure has been identified

This reverts commit dc392b3.

* fix: update condition

(cherry picked from commit 509b1cc)

Co-authored-by: VihasMakwana <121151420+VihasMakwana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Test] fix monitoring test cases
5 participants