-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: tidy-up memory settings and GHA workflow #4994
Conversation
We keep getting failed test runs on GitHub due to out of heap errors. One suggestion provided - by the JVM - is to try the G1GC garbage collector.
Oh, I've just noticed the Java version is out of wack on |
Mind you, the default collector is G1GC according to this: https://docs.oracle.com/en/java/javase/21/gctuning/available-collectors.html. I think it might have been in 11 too. |
We have stopped building the import/export build due to the move to 21 not being supported by the gradle WSDL plugin.
As expected, still getting memory issues, let's increase the heap. |
Adding an extra 1GiB seemed to have no effect. There were a couple of tweaks to the memory pools in the heap, so perhaps these are causing issues. Why they're there is lost in history, so how about we try with out them. If that fails, I give up for now. |
Anyway, I'll do one last thing: Capture memory logs. I'll have a look at them, attach here for propsperity. And then let it be - and await (hopefully) this week's update to the GitHub VM images. |
Seems the syntax I was using was wrong to get the PID into the filename for the heap dump. This change is based on the doco for HeapDumpPath at: https://docs.oracle.com/en/java/javase/21/docs/specs/man/java.html
Seems that doesn't work either, so I give up.
This was actually being partially overridden by the root .sbtopts. So let's try and standardise on just one to avoid confusion.
As the main settings we want to play with a are JVM focused, and this makes it easier to specify them.
There's reports around the internet that the generation of these are known to cause memory issues.
This reverts commit f2325c6. It didn't seem to make any difference - other than different (unhelpful) runtime output. Seeing the change removes all listeners, maybe I needed to add a basic listener back in.
Okay, so it looks as though the new GitHub runner image from last week has again magically fixed the memory issues. But now the build has failed on some flaky tests. I'll run again, and see if we have any success. But at least it didn't fail with OOM errors. If I can get a build to pass - with the tweaks I've made here - then I think we should keep these changes. |
These are no longer being produced.
I had one build fail with a heap memory issue (whereas when there's issues with GitHub runners that's typically a full blown OOM), and I can't get the new UI tests to pass (specifically To address this, the last commit increases from 2GiB max heap to 3. Let's see what that does. |
PortalTest and AdvancedSearchPageTest are both flaking. They used to be fine, and due to the old setup with the `.sbtopts` file the max heap was actually always at 4G so might as well try returning to that.
Hmmm,
😕 I'll retry two more times (max). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Checklist
Description of change
Originally a PR to investigate the issues with OOM issues with GHA. However, again the magically resovled (just like they magically appeared) after yet another update to the GHA runner image.
However in the process I did find a few oddities with how
.sbtopts
and.jvmopts
were setup in the project. And the result was that the memory configuration was nothing like what we thought. So I've consolidated that down into a single root.jvmopts
.Further, I've updated the GHA workflow not only to reflect that, but also to use the java distro we actually now use - Temurin.
Now we seem to have a significant issue with
AdvancedSearchPageTest
in that it is almost always failing - after 10 attempts I give up. I say almost though, as it is passing 1 in 20 times or so. But this is not unique to these changes, as I've also validated it over on the build for #4993 . So let's merge this in, and then we need to have a closer look atAdvancedSearchPageTest
.(This PR should definitely be squashed when merging.)