-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport commits from main #1036
Open
HomayoonAlimohammadi
wants to merge
117
commits into
release-1.32
Choose a base branch
from
KU-2632/1.32-backports
base: release-1.32
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com>
Co-authored-by: neoaggelos <1888650+neoaggelos@users.noreply.github.com>
…ap/join-cluster`. (#863) * fix: ensure containerd-related directories removed on failed `bootstrap/join-cluster` `k8sd` automatically sets up some directories with the appropriate ownership/permissions to be used by containerd in the early stages of the `bootstrap` and `join-cluster` commands. In the classic (non-strict) version of the k8s-snap, these containerd directories are system-wide (e.g. `/etc/containerd`, `/run/containerd`, etc). Should any of the other setup steps fail after the containerd directories were set up, the directories would still remain on disk and thus lead to a 'partial installation' of on the host system. This patch ensures that `k8s` will automatically remove any containerd-related directories which were created in the event of the `bootstrap` / `join-cluster` commands failing. Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com> * fix: ensure containerd Base Dir lockfile is never accidentally deleted. The containerd Base Dir is the special path all other containerd-related paths on the snap are derived from. Under classic confinement and default settings, this path defaults to the host's root (`/`), and thus extreme care must be taken to not accidentally include it in k8sd's cleanup routine or the k8s-snap's remove hook. Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com> --------- Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com>
* Restructure the CIS and DISA STIG hardening guides * Fix spelling errors --------- Co-authored-by: Etienne Audet-Cobello <etienne.audet-cobello@canonical.com> Co-authored-by: nhennigan <niamh.hennigan@canonical.com>
* Deduplicate github actions We have multiple github actions that run e2e tests and share a significant amount of logic. We'll add reusable actions, making the workflows much easier to maintain. * Fix flaky microk8s test As part of the test cleanup, we're removing the k8s snap, ensuring that its services and mounts go away. One of the tests installs microk8s, which interferes with the k8s snap cleanup assertions. We'll fix this flaky test by removing the microk8s snap. * Fix flaky ingress test get_external_service_ip returns an empty string, however the test asserts that the ip is not None and proceeds with the curl: 2024-12-12 11:28:46 DEBUG Execute command ['curl', '', '-H', 'Host: foo.bar.com'] in instance k8s-integration-530bc4-37 We'll update the assertion and catch empty strings as well. At the same time, we'll increase the timeouts to reduce test flakiness. * Merge nightly test and cron job The nightly job is also a cron job that executes daily, so it makes sense to merge those two workflows. * Fix nightly job tag * Pass test flavor * Include all namespaces in inspection reports The moonray job is failing, however we only have logs from the "default" and "kube-system" namespaces. This change will collect logs from all k8s namespaces. * Apply flavor patches before running the tests We'll need to apply the strict/moonray patches not only when building the snap, but also when running the tests. * Skip broken test test_containerd_path_cleanup_on_failed_init holds an open port and expects the bootstrap to fail, however that won't be the case when using the lxd harness. We'll skip this test for now. * Revert "Include all namespaces in inspection reports" This reverts commit 5020f39. * Address PR feedback * cover 1.32 as part of the nightly tests * get go version from go.mod * update step names * add some TODOs * make lxd channel configurable * bump ubuntu versions * add get-e2e-tags dependencies
The LocalHarness is a harness used for running integration tests on the local machine where the tests are directly invoked (be it via `tox`, `pytest`, etc). It presents numerous limitations (can't run any multi-node tests) and poses a lot of potential risks (cleanup failing in case of fatal errors in the test fixtures) which outweighs most of its convenience benefits (especially when compared to the LXD substrate). This patch completely removes the LocalHarness and all references to it from the documentation, making LXD the new default. Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com>
* 1.32 release docs for snap
It was reported in issue #537 that the edit this page button did not work. Previous work was done to fix most pages but the about and community page were still affected. This PR fixes that functionality
* Update etcd guide config option Highlighted in issue #905 this configuration option was not updated when datastore was changed to bootstrap-datastore
Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com>
…ciated test. (#910) --------- Signed-off-by: Nashwan Azhari <nashwan.azhari@canonical.com> Co-authored-by: Berkay Tekin Oz <ozberkaytekin@gmail.com>
Fix broken links and add ignore links to custom_conf due to known sphinx issue with anchors causing false positives
* Add 1.32 charms release notes * Update navigation and channel * Update charm channel to 1.32 stable * Reorganize the navigation side bar to be able to see both snap and charm release notes --------- Co-authored-by: nhennigan <niamh.hennigan@canonical.com>
* Ensure lxd is installed before attempting snap refresh This change checks if the lxd snap is installed before running `snap refresh lxd`, preventing failures when lxd is missing. If lxd is not found, it installs the snap using the specified channel. This is required because the LXD snap is not shipped by default in 24.04 anymore. * use newgrp instead of sg, use sudo --user
We need to bump the copyright headers as expected by the tox "format" job: Copyright 2025 Canonical, Ltd.
When investigating the state of a Kubernetes node, it may be useful to check a few host resources and their availability, to root cause potential memory / disk pressure / other system related issues.
Currently, if the k8sd/v1alpha/lifecycle/skip-stop-services-on-remove annotation is set, we're not stopping the Kubernetes-related services, but we're still removing its certificates and containerd-related paths. This will end up paralyzing services like kubelet, which might have to do Pod evictions, blocking it from finishing its job, and resulting CAPI not being able to complete its downscaling or upgrade operations. We should remove those certificates only if we're also stopping the services.
* Enable cluster-config.load-balancer.l2-mode by default We'll change the defalt value of cluster-config.load-balancer.l2-mode, enabling it by default. * Bump k8s-snap-api version * Update unit test * Update test_smoke * update expected l2 mode * bump the timeout
* update titles Update page titles and headers according to the style guide - with correct capitalization and also the imperative version of the verbs
* Document k8s-snap installation on dev environments We'll recommend the users to use a clean virtual machine or LXD container when trying out k8s-snap. At the same time, we'll document common problems that can arise when installing k8s-snap directly on the development machine along with possible workarounds: * docker and containerd conflicts * fixing the "FORWARD" rules * custom containerd base dir * changing ip addresses * listening on "localhost" Other changes: * move dqlite docs to a separate reference page and add an example on how to connect to k8s-dqlite * fix the release note on "containerd-base-dir", it's a bootstrap config yaml entry, not a cli parameter * document k8sd sql commands * Remove 'strict' reference * Update k8sd sql section * Fix linter error (>80 characters)
When bootstrapping or starting a cluster, we wait for the k8sd server to be fully ready before interacting with it. However, there are edge cases—such as during a snap refresh—where the snap attempts to interact with the CLI (e.g., to configure snap settings) while the database is still initializing. In these scenarios, immediate failure is unnecessary. The k8sd client now retries such requests, ensuring smoother operation. This behavior applies only to specific edge cases where it is known that the microcluster database will eventually become available.
The command "kubectl config show" does not exist and is "kubectl config view".
* Move two-node HA to moonray This is more of a POC rather than a fully supported feature. This was done at request for moonray so moving there
* Include debug symbols We'll include golang and dqlite debug symbols even for release builds. This increases the snap size by 30MB, however it allows us to investigate core dumps. * Generate core dumps In order to effectively investigate k8s-snap crashes, especially ones caused by external C libraries such as dqlite, we'll need core dumps. This change will: * use GOTRACEBACK="crash" * adjust the core dump limit * Collect core dumps * Add inspect.sh --core-dump-dir parameter * default: /var/crash * collect core dumps found at the specified location * add core dump dir and pattern as e2e test settings * TEST_CORE_DUMP_PATTERN * TEST_CORE_DUMP_DIR * configure core dumps as part of the e2e instance initialization * update the "exec" helpers to allow stdout redirection (">"), just like the k8s-dqlite e2e tests * Remove leftover -Wno-suggest-attribute=noreturn
* Add k8s inspect command We're adding a "k8s inspect" command that will invoke the "inspect.sh" script, aiming to improve the user experience. Note that we'll avoid parsing the arguments twice since that's unnecessary and would complicate the process of adding new parameters. * fix formatting * Include auto-generated docs * Fix spell check warning * Fix mock snap * add --core-dump-dir param to help string
* Log a message if the cluster is unitialized Users coming from microk8s may not be used to having to bootstrap the cluster. We'll check k8sd errors and if the message contains "Database is not yet initialized", we'll ask the users to either bootstrap a new cluster or join an existing one. We're adding this check to the query function of the k8sd client so that all the k8s commands may benefit from it. Implements: KU-2481 * Clean up the log messages * Copy the "bootstrapped" checked to each individual command To improve the error messages and avoid having too many nested errors, we'll have each individual command check if the cluster was initialized. That being considered, we'll make the k8sd client error less verbose. * Fix unit tests, updating k8sd mock
Fix typo that causes the script to fail
* add dqlite configuration to troubleshooting page Co-authored-by: Louise K. Schmidtgen <louise.schmidtgen@canonical.com>
* inspect.sh: avoid logging an error if there are no core dumps The inspect.sh script will log an error if the core dump dir is empty. We'll add a check to improve the user experience. INFO: Copy dmesg entries INFO: Collecting core dumps from /var/crash. Size: 4.0K /var/crash cp: cannot stat '/var/crash/*': No such file or directory Collecting snap and related information * inspect.sh: Replace backticks https://www.shellcheck.net/wiki/SC2006
…ps (#1032) * tests: enrich node configuration controller tests with signed configmaps Signed-off-by: Reza Abbasalipour <reza.abbasalipour@canonical.com> * fix: change configmap to trigger restart in case of valid signature Signed-off-by: Reza Abbasalipour <reza.abbasalipour@canonical.com> * chore: try with different configs with invalid signature to make sure they are not applied Signed-off-by: Reza Abbasalipour <reza.abbasalipour@canonical.com> * tests: add a test case to update node configuration controller to account for signed configmaps Signed-off-by: Reza Abbasalipour <reza.abbasalipour@canonical.com> * fix: improve test case names Signed-off-by: Reza Abbasalipour <reza.abbasalipour@canonical.com> --------- Signed-off-by: Reza Abbasalipour <reza.abbasalipour@canonical.com>
This change will collect all the inspection reports before initiating the node cleanup process. Otherwise we interfere with the observed cluster, potentially breaking it, which can impede the debugging process.
…oin (#1029) Use certificates from join config while a new control plane is joining.
bschimke95
approved these changes
Feb 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice little PR
Historically, we used the Cilium loadbalancer for the loadbalancer feature. With the move to Metallb, the dependency to the network feature is no longer required.
* capi docs: add intermediate ca how-to We're adding a guide that shows how intermediate CAs can be generated using HashiCorp Vault and passed to CAPI using management cluster secrets. * Address PR comments * Address comments * address PR comments * avoid using more than 80 characters per line, this is likely to upset linters * Add link to an article that describes Vault cert-manager integration
* Fix custom containerd paths For some reason, there are two almost identical kubelet containerd flags and we have to set both: ``` $ snap logs k8s.kubelet -n 30000 | grep FLAG | grep containerd FLAG: --container-runtime-endpoint="/home/ubuntu/containerd/k8s-containerd/run/containerd/containerd.sock" FLAG: --containerd="/run/containerd/containerd.sock" FLAG: --containerd-namespace="k8s.io" ``` ``` $ kubelet -h | grep containerd --container-runtime-endpoint string The endpoint of container runtime service. --containerd string containerd endpoint ``` This change will: * pass the missing containerd flag * update the e2e test for custom containerd paths to check if the cluster actually becomes available after bootstrap * update the dev doc to enable the net, dns and local storage features * do the same for the e2e test * Update unit tests
--------- Co-authored-by: Mateo Florido <mateo.florido@canonical.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds commits from
main
that we wanted to be backported inrelease-1.32
.These commits include everything in
main
since therelease-1.32
branch out, except the ones that were already backported, and the following: