Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test]: TestStandaloneUpgradeUninstallKillWatcher / TestStandaloneDowngradeToSpecificSnapshotBuild / TestStandaloneUpgradeRetryDownload – target version has the same commit hash #4152

Closed
pierrehilbert opened this issue Jan 26, 2024 · 7 comments · Fixed by #4163 or #4191
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team

Comments

@pierrehilbert
Copy link
Contributor

Failing test case

TestStandaloneUpgradeUninstallKillWatcher / TestStandaloneDowngradeToSpecificSnapshotBuild / TestStandaloneUpgradeRetryDownload

Error message

target version has the same commit hash

Build

https://buildkite.com/elastic/elastic-agent/builds/6583

OS

Linux, Windows

Stacktrace and notes

Stacktrace: 
=== RUN   TestStandaloneUpgradeUninstallKillWatcher
    fetcher.go:91: Using existing artifact elastic-agent-8.13.0-SNAPSHOT-windows-x86_64.zip
    fixture.go:261: Extracting artifact elastic-agent-8.13.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUninstallKillWatcher2473922599\001
    fixture.go:274: Completed extraction of artifact elastic-agent-8.13.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUninstallKillWatcher2473922599\001
    fixture.go:829: Components were not modified from the fetched artifact
    fixture.go:624: >> running binary with: [C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUninstallKillWatcher2473922599\001\elastic-agent-8.13.0-SNAPSHOT-windows-x86_64\elastic-agent.exe version --binary-only --yaml]
    fetcher.go:91: Using existing artifact elastic-agent-8.13.0-SNAPSHOT-windows-x86_64.zip
    fixture.go:261: Extracting artifact elastic-agent-8.13.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUninstallKillWatcher2473922599\002
    fixture.go:274: Completed extraction of artifact elastic-agent-8.13.0-SNAPSHOT-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUninstallKillWatcher2473922599\002
    fixture.go:829: Components were not modified from the fetched artifact
    fixture.go:624: >> running binary with: [C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUninstallKillWatcher2473922599\002\elastic-agent-8.13.0-SNAPSHOT-windows-x86_64\elastic-agent.exe version --binary-only --yaml]
    fixture.go:624: >> running binary with: [C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUninstallKillWatcher2473922599\001\elastic-agent-8.13.0-SNAPSHOT-windows-x86_64\elastic-agent.exe version --binary-only --yaml]
    upgrade_uninstall_test.go:67: 
        	Error Trace:	C:/Users/windows/agent/testing/integration/upgrade_uninstall_test.go:67
        	Error:      	Received unexpected error:
        	            	target version has the same commit hash "5f578c5d4cc65371d9a6cee31fb22edf6150960d"
        	Test:       	TestStandaloneUpgradeUninstallKillWatcher
--- FAIL: TestStandaloneUpgradeUninstallKillWatcher (29.37s)
@pierrehilbert pierrehilbert added Team:Elastic-Agent Label for the Agent team flaky-test Unstable or unreliable test cases. labels Jan 26, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@rdner rdner self-assigned this Jan 26, 2024
@rdner
Copy link
Member

rdner commented Jan 26, 2024

TestStandaloneDowngradeToSpecificSnapshotBuild should have been fixed by #4095

I'll investigate why it didn't work.

@rdner
Copy link
Member

rdner commented Jan 26, 2024

Interesting, I'm looking at the TestStandaloneDowngradeToSpecificSnapshotBuild test case and there is a log line:

{
  "Time": "2024-01-26T09:36:35.485254967Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(linux-amd64-ubuntu-2204-upgrade)(sudo)",
  "Test": "TestStandaloneDowngradeToSpecificSnapshotBuild",
  "Output": "    upgrade_downgrade_test.go:105: Testing Elastic Agent upgrade from 8.13.0-SNAPSHOT to 8.13.0-SNAPSHOT+l534sdis...\n"
}

So, the target version was 8.13.0-SNAPSHOT+l534sdis which according to the artefact API (elastic-agent-package project) has the commit hash 249430b and the CI build was triggered for 5f578c5

So, the check I introduced in #4095 passed because of that, but in the upgrader itself here

if startVersionInfo.Binary.Commit == endVersionInfo.Binary.Commit {
return fmt.Errorf("target version has the same commit hash %q", endVersionInfo.Binary.Commit)
}
the hashes are already matching. Something replaced the value on the way. I'll keep looking into it.

@pierrehilbert
Copy link
Contributor Author

Maybe this can be related to our work for releasing Agent faster as we are adding the +buildID there

@rdner
Copy link
Member

rdner commented Jan 26, 2024

I found 2 possible problems:

1. Our fetcher discards the build ID unless f.snapshotOnly is false:

var uri string
var prevErr error
if !f.snapshotOnly {
uri, prevErr = findURI(ctx, f.doer, version)
}
preVersion := version
version, _ = splitBuildID(version)
if uri == "" {
if !strings.HasSuffix(version, "-SNAPSHOT") {
version += "-SNAPSHOT"
}
uri, err = findURI(ctx, f.doer, version)
if err != nil {
return nil, fmt.Errorf("failed to find snapshot URI for version %s: %w (previous error: %w)", preVersion, err, prevErr)
}
}
path := fmt.Sprintf("elastic-agent-%s-%s", version, suffix)
downloadSrc := fmt.Sprintf("%s%s", uri, path)

On line 72 we remove the build suffix and further we use only the version prefix.

We use the fetcher in the fixture like this:

endFixture, err := atesting.NewFixture(
t,
endParsedVersion.String(),
atesting.WithFetcher(atesting.ArtifactFetcher()),
)
require.NoError(t, err)

However, it should not have broken the test cases reported in this issue, because in them f.snapshotOnly = false.

2. Our artefact cache inside the fixture does not account for the build ID:

For each artefact download we have a log line like this:

{
  "Time": "2024-01-26T09:12:47.267365303Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(linux-amd64-ubuntu-2204-fleet-airgapped-privileged)(sudo)",
  "Test": "TestFleetAirGappedUpgradePrivileged",
  "Output": "    fetcher_artifact.go:213: Downloading artifact from https://snapshots.elastic.co/8.13.0-ous7zk9j/downloads/beats/elastic-agent/elastic-agent-8.13.0-SNAPSHOT-linux-x86_64.tar.gz\n"
}

but there is no log line for downloading 8.13.0-l534sdis and for TestStandaloneDowngradeToSpecificSnapshotBuild we always have

{
  "Time": "2024-01-26T09:32:57.4664485Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneDowngradeToSpecificSnapshotBuild",
  "Output": "    fetcher.go:91: Using existing artifact elastic-agent-8.13.0-SNAPSHOT-windows-x86_64.zip\n"
}

The test fixture has its own cache for artefacts:

res, err := f.fetcher.Fetch(ctx, f.operatingSystem, f.architecture, f.version)
if err != nil {
return "", err
}
path, err := cache.fetch(ctx, f.t, res)
if err != nil {
return "", err
}
return path, nil

and it's using a result of the Name() function as a cache key:

func (c *fetcherCache) fetch(ctx context.Context, l Logger, res FetcherResult) (string, error) {
name := res.Name()
src := filepath.Join(c.dir, name)
_, err := os.Stat(src)
if err == nil || os.IsExist(err) {
l.Logf("Using existing artifact %s", name)
return src, nil
}
err = res.Fetch(ctx, l, c.dir)
if err != nil {
return "", err
}
return src, nil
}

which is just a path of the artefact

func (r *artifactResult) Name() string {
return r.path
}

which is eventually just a filename built here:

path := fmt.Sprintf("elastic-agent-%s-%s", version, suffix)

@rdner
Copy link
Member

rdner commented Jan 26, 2024

To summarise: our fetcher and cache don't support a build suffix. This means that every test that requests a certain build version would fetch the latest version snapshot instead.

I suppose the tests reported by this issue are such tests that work with build IDs.

I'm working on a fix.

@rdner
Copy link
Member

rdner commented Feb 5, 2024

TestStandaloneUpgradeRetryDownload and TestStandaloneUpgradeUninstallKillWatcher are still failing because they don't select a proper snapshot as TestStandaloneDowngradeToSpecificSnapshotBuild does.

Build sample https://buildkite.com/elastic/elastic-agent/builds/6766#018d7830-5a99-4bea-8f4e-1300402c505c

Problematic code:

// Upgrade to an older snapshot build of same version.
endFixture, err := atesting.NewFixture(
t,
define.Version(),
atesting.WithFetcher(atesting.ArtifactFetcher()),
)

// Upgrades to build under test.
endFixture, err := define.NewFixture(t, define.Version())

Working on the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team
Projects
None yet
3 participants