Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TestLogIngestionFleetManaged, TestDebLogIngestFleetManaged #5375

Merged
merged 10 commits into from
Sep 2, 2024

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Aug 28, 2024

[Integration Tests] Generate namespace based on UUIDv4

What does this PR do?

TestLogIngestionFleetManaged was failing because the namespace generated by the integration tests framework was not unique among different tests and test runs, so sometimes collisions would occurs causing some tests to be flaky.

TestDebLogIngestFleetManaged was failing because it also has got Beats logging connection errors before receiving the configuration from Elastic-Agent, now this message is also in the allow list.

When testing .deb the AGENT_KEEP_INSTALLED environment variable is respected.

When an integration test fails, the work directory created by the framework is now kept and its path is printed.

Why is it important?

It fixes flakiness in our tests

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

## Disruptive User Impact

How to test this PR locally

Run an integration test: mage -v integration:single TestLogIngestionFleetManaged

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@belimawr belimawr added Team:Elastic-Agent Label for the Agent team skip-changelog labels Aug 28, 2024
@belimawr belimawr requested a review from a team as a code owner August 28, 2024 17:24
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Copy link
Contributor

mergify bot commented Aug 28, 2024

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@belimawr belimawr changed the title [Integration Tests] Generate namespace based on UUIDv4 Fix TestLogIngestionFleetManaged, TestDebLogIngestFleetManaged Aug 29, 2024
belimawr and others added 8 commits August 30, 2024 11:15
This commit enables the work directory used by the integration tests
framework to be kept in the filesystem if the test fails. The full
path of the test directory is printed when the test fails.
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
The namespace generated by the integration tests framework was not
unique among different tests and test runs, so sometimes collisions
would occurs causing some tests to be flaky.
- Remove debug logs
- Make the deb respect the AGENT_KEEP_INSTALLED env var
- Add errors that only happen on deb to allow list
@belimawr belimawr force-pushed the fix-namespace-integ-test branch from 06ad52d to be00df9 Compare August 30, 2024 17:08
@belimawr belimawr removed the skip-ci label Aug 30, 2024
@belimawr
Copy link
Contributor Author

/test

@jlind23 jlind23 enabled auto-merge (squash) September 2, 2024 13:51
@jlind23 jlind23 disabled auto-merge September 2, 2024 15:36
@jlind23
Copy link
Contributor

jlind23 commented Sep 2, 2024

Force merging this PR to get main back to an healthy state.

Here the TestEvenLogFile failed on windows with this error which is unrelated to those change :

{"Time":"2024-09-02T14:24:00.640065Z","Action":"output","Package":"github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-default)","Test":"TestEventLogFile","Output":"    fixture.go:1213: could not remove temp dir 'C:\\Users\\windows\\AppData\\Local\\Temp\\TestEventLogFile1663512932': remove C:\\Users\\windows\\AppData\\Local\\Temp\\TestEventLogFile1663512932\\elastic-agent-8.16.0-SNAPSHOT-windows-x86_64\\data\\elastic-agent-dd44f5\\components\\agentbeat.exe: Access is denied.\n"}

@jlind23 jlind23 merged commit 6695324 into elastic:main Sep 2, 2024
11 of 13 checks passed
@belimawr belimawr added the backport-8.15 Automated backport to the 8.15 branch with mergify label Sep 3, 2024
mergify bot pushed a commit that referenced this pull request Sep 3, 2024
* [integration tests] Keep work directory if test fails

This commit enables the work directory used by the integration tests
framework to be kept in the filesystem if the test fails. The full
path of the test directory is printed when the test fails.

* Update pkg/testing/fixture.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Update pkg/testing/fixture.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Remove log in error branch

* [Integration Tests] Generate namespace based on UUIDv4

The namespace generated by the integration tests framework was not
unique among different tests and test runs, so sometimes collisions
would occurs causing some tests to be flaky.

* Add debug logs

* run mage fmt

* Fix TestDebLogIngestFleetManaged

- Remove debug logs
- Make the deb respect the AGENT_KEEP_INSTALLED env var
- Add errors that only happen on deb to allow list

* Improve logs and test error message

---------

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Julien Lind <julien.lind@elastic.co>
(cherry picked from commit 6695324)
mergify bot pushed a commit that referenced this pull request Sep 6, 2024
* [integration tests] Keep work directory if test fails

This commit enables the work directory used by the integration tests
framework to be kept in the filesystem if the test fails. The full
path of the test directory is printed when the test fails.

* Update pkg/testing/fixture.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Update pkg/testing/fixture.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Remove log in error branch

* [Integration Tests] Generate namespace based on UUIDv4

The namespace generated by the integration tests framework was not
unique among different tests and test runs, so sometimes collisions
would occurs causing some tests to be flaky.

* Add debug logs

* run mage fmt

* Fix TestDebLogIngestFleetManaged

- Remove debug logs
- Make the deb respect the AGENT_KEEP_INSTALLED env var
- Add errors that only happen on deb to allow list

* Improve logs and test error message

---------

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Julien Lind <julien.lind@elastic.co>
(cherry picked from commit 6695324)
belimawr pushed a commit that referenced this pull request Sep 6, 2024
TestLogIngestionFleetManaged was failing because the namespace generated by the integration tests framework was not unique among different tests and test runs, so sometimes collisions would occurs causing some tests to be flaky.

TestDebLogIngestFleetManaged was failing because it also has got Beats logging connection errors before receiving the configuration from Elastic-Agent, now this message is also in the allow list.

When testing .deb the AGENT_KEEP_INSTALLED environment variable is respected.

When an integration test fails, the work directory created by the framework is now kept and its path is printed.

createTempDir register a test cleanup function to remove the folder it created, however, on Windows, this folder sometimes fails to be removed because there are still open file handlers for the files within the folder.

We fix this problem by retrying to remove the folder with a maximum overall wait time of 3 seconds. This is a very similar approach to what Go's t.TempDir does.

Fix the flakiness from TestUpgradeHandler* tests by re-working the
mockUpgradeManager, now it accepts a function for its Upgrade method
and their implementation is goroutine safe

TestEnvWithDefault
Now TestEnvWithDefault unsets all environment variables it sets,
allowing it to be run multiple times using -count.

TestContainerCMDEventToStderr
TestContainerCMDEventToStderr did not call agentFixture.Prepare early
enough leading to an empty STATE_PATH env var, so all state
information was in /usr/share/elastic-agent, which could make
subsequent tests to fail because they could read
/usr/share/elastic-agent/state/container-paths.yml and use a state
path different than the one set in the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify backport-skip skip-changelog Team:Elastic-Agent Label for the Agent team
Projects
None yet
7 participants