-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cloud-init blocking on WSL1 #508
Conversation
The CPC publisher no longer strips the "ubuntu" prefix. Look at https://cloud-images.ubuntu.com/wsl/jammy/current/: .. [ ] SHA256SUMS.gpg 2024-12-13 15:01 833 [ ] ubuntu-jammy-wsl-amd64-ubuntu.rootfs.tar.gz 2024-12-12 21:42 324M File system image and Kernel packed [ ] ubuntu-jammy-wsl-amd64-ubuntu22.04lts.rootfs.tar.gz 2024-12-12 21:42 324M File system image and Kernel packed [TXT] ubuntu-jammy-wsl-amd64-wsl.manifest 2024-12-12 21:42 17K Package manifest file [ ] ubuntu-jammy-wsl-arm64-ubuntu.rootfs.tar.gz 2024-12-12 21:45 308M File system image and Kernel packed [ ] ubuntu-jammy-wsl-arm64-ubuntu22.04lts.rootfs.tar.gz 2024-12-12 21:45 308M File system image and Kernel packed [TXT] ubuntu-jammy-wsl-arm64-wsl.manifest 2024-12-12 21:45 17K Package manifest file ... The images contain a suffix "ubuntu22.04lts". Also, that naming convention used to be from 24.04 onwards. But as we can clearly see 22.04 and 24.04 are following the same naming convention (see https://cloud-images.ubuntu.com/wsl/noble/current/): [ ] SHA256SUMS.gpg 2024-12-13 05:37 833 [ ] ubuntu-noble-wsl-amd64-ubuntu.rootfs.tar.gz 2024-12-12 20:21 347M File system image and Kernel packed [ ] ubuntu-noble-wsl-amd64-ubuntu24.04lts.rootfs.tar.gz 2024-12-12 20:21 347M File system image and Kernel packed [TXT] ubuntu-noble-wsl-amd64-wsl.manifest 2024-12-12 20:20 16K Package manifest file [ ] ubuntu-noble-wsl-arm64-ubuntu.rootfs.tar.gz 2024-12-12 20:23 332M File system image and Kernel packed [ ] ubuntu-noble-wsl-arm64-ubuntu24.04lts.rootfs.tar.gz 2024-12-12 20:23 332M File system image and Kernel packed [TXT] ubuntu-noble-wsl-arm64-wsl.manifest 2024-12-12 20:23 16K Package manifest file
0006afd
to
bdbc15b
Compare
Every now and then we see something like: C:\actions-runner\_work\WSL\WSL\.github\actions\download-rootfs\download-rootfs.ps1 : Cannot index into a null array. At C:\actions-runner\_work\_temp\aa03e702-b864-4685-b25e-83a2e2853d79.ps1:31 char:3 + & C:\actions-runner\_work\WSL\WSL\./.github/actions/download-rootfs ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (:) [download-rootfs.ps1], RuntimeException + FullyQualifiedErrorId : NullArray,download-rootfs.ps1 The trace was too shallow to figure out what was that about. After putting some time to run this script locally I finally saw wheer it was coming from: - The pattern was incorrect. - When attempting to test the images checksum we'd never have a match. - Luckily we don't test that at every CI run (to make things harder). As a additional layer of precaution, I'm checking the size of the matched collection.
That's the only surprising behaviour affecting WSL1. Also, if systemd is disabled (or the cloud-int.service unit), running cloud-init --wait wouild block forever. While I'd like the cloud-init command to properly handle the cases where it cannot even start, this is a simple enough fix on our side. If systemd is disabled by /etc/wsl.conf, systemctl is-enabled --quiet cloud-init.service would still report the unit as enabled, so we need a broader check.
If this works as expected, one test case should run under WSL1. Without the current patch that would block forever. With the launcher patch, cloud-init status --wait won't even run, so no hangs.
bdbc15b
to
472a2d8
Compare
The withWSL1 test case parameter was never set, diff got lost during rebase :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now :)
That was weird:
|
We don't need to shutdown it again later, when restoring WSL 2, because we already hooked up a test Cleanup to do so.
The VM was crashing. There are quite a few events like the one below that happened yesterday and today. ![image](https://github.com/user-attachments/assets/f6160c34-08e1-4fdd-9b3d-9f79a76001e3) I resized it to the [second VM size compatible with GitHub large Windows runners](https://docs.github.com/en/actions/using-github-hosted-runners/using-larger-runners/about-larger-runners#specifications-for-general-larger-runners) (which are available by default now for open source projects) so that, if we were to migrate this out of our self-hosted runners we wouldn't be surprised with a downgrade. I then resized to the first size, as the change didn't solve the crash. That seems outside of our control. Even with more powerfull specs than my laptop, the crash still happened in the VM but didn't in my computer. I'll turn the WSL1 test case off for now and park a card to investigate this later.
The VM was crashing. There are quite a few events like the one below that happened yesterday and today. I resized it to the second VM size compatible with GitHub large Windows runners (which are available by default now for open source projects) so that, if we were to migrate this out of our self-hosted runners we wouldn't be surprised with a downgrade. I then resized to the first size, as the change didn't solve the crash. That seems outside of our control. Even with more powerfull specs than my laptop, the crash still happened in the VM but didn't in my computer. I'll turn the WSL1 test case off for now and park a card to investigate this later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting issue with the VMs, something to investigate later. Everything else looks good
cloud-init status --wait
blocks forever if cloud-init won't ever start. Known cases where that may happen:We're not considering that properly. With the patch bb72652, we check if both the cloud-init.service unit is enabled (which covers 1) as well as systemd not offline, which covers 2. and 3.
The very same check was implemented in the new script that will replace this launch in the new WSL format via this PR: ubuntu/wsl-setup#17.
I also added an end-to-end test case that sets WSL1 as the default setting globally and attempts to register the instance. Without that patch the test would timeout. See 472a2d8.
UDENG-5629.
CI had two issues blocking me:
releases-info
, a Go binary used to select which rootfses to build the application packages from, had outdated considerations about what CPC publishes under https://cloud-images.ubuntu.com/wsl:Look at https://cloud-images.ubuntu.com/wsl/jammy/current/:
The images contain a suffix "ubuntu22.04lts".
Also, that naming convention used to be from 24.04 onwards.
But as we can clearly see 22.04 and 24.04 are following the same naming convention
(see https://cloud-images.ubuntu.com/wsl/noble/current/):
That was not really helpful. After investing some time on this I could finally learn that we implemented the wrong regex pattern (two spaces instead of a space and a raw star/asterisk character).