Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: tidy-up memory settings and GHA workflow #4994

Merged
merged 19 commits into from
Feb 4, 2024

Conversation

edalex-ian
Copy link
Member

@edalex-ian edalex-ian commented Jan 8, 2024

Checklist
Description of change

Originally a PR to investigate the issues with OOM issues with GHA. However, again the magically resovled (just like they magically appeared) after yet another update to the GHA runner image.

However in the process I did find a few oddities with how .sbtopts and .jvmopts were setup in the project. And the result was that the memory configuration was nothing like what we thought. So I've consolidated that down into a single root .jvmopts.

Further, I've updated the GHA workflow not only to reflect that, but also to use the java distro we actually now use - Temurin.

Now we seem to have a significant issue with AdvancedSearchPageTest in that it is almost always failing - after 10 attempts I give up. I say almost though, as it is passing 1 in 20 times or so. But this is not unique to these changes, as I've also validated it over on the build for #4993 . So let's merge this in, and then we need to have a closer look at AdvancedSearchPageTest.

(This PR should definitely be squashed when merging.)

We keep getting failed test runs on GitHub due to out of heap errors. One suggestion provided - by the JVM - is to try the G1GC garbage collector.
@edalex-ian edalex-ian self-assigned this Jan 8, 2024
@edalex-ian
Copy link
Member Author

Oh, I've just noticed the Java version is out of wack on develop. So this PR will fail. 😭

@edalex-ian
Copy link
Member Author

Mind you, the default collector is G1GC according to this: https://docs.oracle.com/en/java/javase/21/gctuning/available-collectors.html. I think it might have been in 11 too.

We have stopped building the import/export build due to the move to 21 not being supported by the gradle WSDL plugin.
@edalex-ian
Copy link
Member Author

As expected, still getting memory issues, let's increase the heap.

@edalex-ian edalex-ian changed the title DRAFT: test(GitHub CI): use G1GC DRAFT: test(GitHub CI): address OOM issues Jan 8, 2024
@edalex-ian
Copy link
Member Author

Adding an extra 1GiB seemed to have no effect. There were a couple of tweaks to the memory pools in the heap, so perhaps these are causing issues.

Why they're there is lost in history, so how about we try with out them.

If that fails, I give up for now.

@edalex-ian edalex-ian marked this pull request as draft January 8, 2024 22:39
@edalex-ian edalex-ian changed the title DRAFT: test(GitHub CI): address OOM issues test(GitHub CI): address OOM issues Jan 8, 2024
@edalex-ian edalex-ian marked this pull request as ready for review January 8, 2024 23:50
@edalex-ian
Copy link
Member Author

edalex-ian commented Jan 8, 2024

Anyway, I'll do one last thing: Capture memory logs.

I'll have a look at them, attach here for propsperity. And then let it be - and await (hopefully) this week's update to the GitHub VM images.

@edalex-ian edalex-ian marked this pull request as draft January 8, 2024 23:54
Seems the syntax I was using was wrong to get the PID into the filename
for the heap dump. This change is based on the doco for HeapDumpPath at:

https://docs.oracle.com/en/java/javase/21/docs/specs/man/java.html
Seems that doesn't work either, so I give up.
This was actually being partially overridden by the root .sbtopts. So
let's try and standardise on just one to avoid confusion.
As the main settings we want to play with a are JVM focused, and this
makes it easier to specify them.
There's reports around the internet that the generation of these are
known to cause memory issues.
This reverts commit f2325c6.

It didn't seem to make any difference - other than different (unhelpful)
runtime output. Seeing the change removes all listeners, maybe I needed
to add a basic listener back in.
@edalex-ian
Copy link
Member Author

Okay, so it looks as though the new GitHub runner image from last week has again magically fixed the memory issues. But now the build has failed on some flaky tests.

I'll run again, and see if we have any success. But at least it didn't fail with OOM errors.

If I can get a build to pass - with the tweaks I've made here - then I think we should keep these changes.

@edalex-ian
Copy link
Member Author

I had one build fail with a heap memory issue (whereas when there's issues with GitHub runners that's typically a full blown OOM), and I can't get the new UI tests to pass (specifically AdvancedSearchPageTest). I'm think that with the new configuration for memory (which I think previously was always ending up at 4GiB) perhaps it's a bit too frugal.

To address this, the last commit increases from 2GiB max heap to 3. Let's see what that does.

PortalTest and AdvancedSearchPageTest are both flaking. They used to be
fine, and due to the old setup with the `.sbtopts` file the max heap was
actually always at 4G so might as well try returning to that.
@edalex-ian
Copy link
Member Author

Hmmm, io.github.openequella.search.AdvancedSearchPageTest just keeps failing - and nothing else:

FAILED: userSelector
        Search by using the User selector control
org.openqa.selenium.NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":".//span[text()='AutoTest']"}

😕

I'll retry two more times (max).

@edalex-ian edalex-ian changed the title test(GitHub CI): address OOM issues build: tidy-up memory settings and GHA pipeline Feb 1, 2024
@edalex-ian edalex-ian marked this pull request as ready for review February 1, 2024 23:11
@edalex-ian edalex-ian changed the title build: tidy-up memory settings and GHA pipeline build: tidy-up memory settings and GHA workflow Feb 1, 2024
Copy link
Contributor

@PenghaiZhang PenghaiZhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@PenghaiZhang PenghaiZhang merged commit ec7488d into develop Feb 4, 2024
6 checks passed
@PenghaiZhang PenghaiZhang deleted the ci/functional-tests-failing branch February 4, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants