Kubernetes e2e tests #4984

pkoutsovasilis · 2024-06-21T11:44:15Z

After this PR the need to have container-based e2e tests became apparent. The first thing that comes to mind are the existing k8s-tests defined here. After syncing with @cmacknz and @blakerouse, in the context of the aforementioned PR, I took a deeper look on what these tests are testing. Here are my findings:

The kind cluster is provisioned with Crashing pods (tested with kind v0.20.0 and k8s v1.29.0)

NAMESPACE     NAME                                         READY   STATUS             RESTARTS        AGE
kube-system   etcd-kind-control-plane                      1/1     Running            0               9m6s
kube-system   kube-apiserver-kind-control-plane            1/1     Running            0               9m6s
kube-system   kube-controller-manager-kind-control-plane   0/1     CrashLoopBackOff   7 (3m18s ago)   9m6s
kube-system   kube-scheduler-kind-control-plane            0/1     CrashLoopBackOff   7 (3m7s ago)    9m6s

and the reason behind this is Error: unknown flag: --port defined here. As a result, no new pods are gonna get scheduled

The tests create the generated k8s manifest from kustomize templates with kubectl create .... This is essentially a yaml spec validation and it is not failing as the kube-apiserver-kind-control-plane is Running
Even if the pods were getting scheduled, the agent container image version "injected" in the k8s manifest is not the one build from sources of a commit but it derives from here.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-06-21T22:21:36Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

blakerouse · 2024-06-24T13:37:27Z

I would really like to see the integration tests for kubernetes work similar to how the integration testing framwork for running on different hosts. Example tests would be something like:

func TestExampleKubernetesIntegration(t *testing.T) {
   info := define.Require({
       Type: define.Kubernetes,
       Cloud: define.GKE,
    })

   client := info.KubeClient()
   
   // perform kubernetes work using the client
   // create a pod, push image, etc...

Then running mage integration:container it would setup docker, kubernetes in GKE, AKS, etc all based on the defined tests and then run the tests. Exactly how its done for the integration testing framework, but this is really even simpler because SSH is not needed to actually push the tests to the VM nodes.

cmacknz · 2024-06-24T15:42:02Z

I like driving this with the integration test framework for consistency.

I don't think we need to start with using a remote cluster in GKE. We could, but we could also start with a Kind cluster running on the local machine if that is simpler to implement. It will certainly be faster to iterate on locally, and we'll be building that anyway as the anolgue of the multipass runner.

blakerouse · 2024-06-24T18:49:04Z

@cmacknz Agree that starting local is easier and should be the first step. Extending to external k8s would be great for better coverage, which I think we would want to do once we have the local one working.

pkoutsovasilis · 2024-06-25T10:55:15Z

@blakerouse @cmacknz thanks for the proposals. After looking at the mage-related code, I have the following points for further dicsussion:

integRunnerOnce -> createTestRunner seems like the core of integration tests at the moment; but this doesn't seem to be ideal for k8s based tests as we don't want to provision any runners/vms neither from ogc nor multipass since we are gonna use VMs given by buildkite. On the same page, these two are too tightly coupled with ssh execution, matrices, etc. So we should stay away from these two, right?!
which stack provisioner to use for the needed stack stateful or serverless or support both?

some visual aid of what I have in mind

flowchart TB
dockerPackage[build agent docker package]
provisionCluster["provision k8s cluster [initially only kind, requires K8S_VERSION env var]"]
exportEnvVarsK8S[export env vars for kubernetes]
provisionStack[provision stack]
exportEnvVars[export env vars for stack]
invokeTest["invoke go kubernetes integration tests [separate package testing/kubernetes]"]
innerTest["define.Require(...)"]
subgraph mage["mage integration:kubernetes"]
dockerPackage --> provisionCluster --> exportEnvVarsK8S --> provisionStack --> exportEnvVars --> invokeTest 
end
subgraph test["Individual Kubernetes Test"]
invokeTest -.-> innerTest --> kube["client := info.KubeClient()"] --> dots["..."]
end
buildkite[buildkite VM] --> dockerPackage

blakerouse · 2024-06-25T13:58:26Z

@blakerouse @cmacknz thanks for the proposals. After looking at the mage-related code, I have the following points for further dicsussion:

integRunnerOnce -> createTestRunner seems like the core of integration tests at the moment; but this doesn't seem to be ideal for k8s based tests as we don't want to provision any runners/vms neither from ogc nor multipass since we are gonna use VMs given by buildkite. On the same page, these two are too tightly coupled with ssh execution, matrices, etc. So we should stay away from these two, right?!

How are developers going to perform the tests locally from there developer machines if it requires only buildkite to work? That does not provide a way for developers to easily inspect, interact, and debug an issue if it can only be done in CI.

Totally possible to have a buildkite provisioner in the integration testing framework that relies on information from buildkite or performs actions to buildkite. The system must be designed in a way that ensures that developers can do the following above. A kind provider for developers would be great and then a buildkite provisioner for CI.

which stack provisioner to use for the needed stack stateful or serverless or support both?

Would want to support both.

pkoutsovasilis · 2024-06-25T15:07:30Z

How are developers going to perform the tests locally from there developer machines if it requires only buildkite to work? That does not provide a way for developers to easily inspect, interact, and debug an issue if it can only be done in CI.

probably I am missing something but sorry I don't follow. As far as my understanding goes a dev that wants to test this locally would have docker installed and kind based on their operating system and they will be able to run, as an example, something like thisK8S_VERSION=v1.29.0 mage integration:kubernetes. Why the above is more demanding that requiring the dev to have multipass installed?!

cmacknz · 2024-06-25T17:49:45Z

Having kind installed as a prerequisite makes sense to me, and this does not require any special handling for CI or introduce a dependency on Buildkite as we already have the ability to setup kind on the Buildkite agents:

elastic-agent/.buildkite/pipeline.yml

Lines 180 to 200 in b4af28b

    
           - group: "K8s tests" 
        
             key: "k8s-tests" 
        
             steps: 
        
               - label: "K8s tests: {{matrix.k8s_version}}" 
        
                 env: 
        
                   K8S_VERSION: "v{{matrix.k8s_version}}" 
        
                   KIND_VERSION: "v0.20.0" 
        
                 command: ".buildkite/scripts/steps/k8s-tests.sh" 
        
                 agents: 
        
                   provider: "gcp" 
        
                   image: "family/core-ubuntu-2204" 
        
                 matrix: 
        
                   setup: 
        
                     k8s_version: 
        
                       - "1.29.0" 
        
                       - "1.28.0" 
        
                       - "1.27.3" 
        
                       - "1.26.6" 
        
                 retry: 
        
                   manual: 
        
                     allowed: true

elastic-agent/.buildkite/scripts/steps/k8s-tests.sh

Lines 4 to 8 in b4af28b

    
           export PATH=$HOME/bin:${PATH} 
        
           source .buildkite/scripts/install-kubectl.sh 
        
           source .buildkite/scripts/install-kind.sh 
        
           kind create cluster --image "kindest/node:${K8S_VERSION}" --config - <<EOF

Given you have kind available, then the "provisioning a VM" step just becomes "ensure a kind cluster exists on the local machine".

The next piece is provisioning the stack for the agent to interact with. This could be left completely unchanged, and the agent in the kind cluster could be enrolled with a real stateful or serverless deployment as everything else does.

Having a kind cluster available does give us to the option to put the entire stack in the kind cluster as well. This would be quite convenient for local testing as it would have no remote dependencies at all, but it may be faster to start with using the existing stack provisions and add this as a follow up.

cmacknz · 2024-06-25T17:52:17Z

Essentially, this would lead us to replacing everything from this point onward in the existing k8s-tests.sh script with just mage integration:kubernetes which would also work locally as long as you have kind on your machine.

elastic-agent/.buildkite/scripts/steps/k8s-tests.sh

Lines 8 to 30 in b4af28b

    
           kind create cluster --image "kindest/node:${K8S_VERSION}" --config - <<EOF 
        
           kind: Cluster 
        
           apiVersion: kind.x-k8s.io/v1alpha4 
        
           nodes: 
        
           - role: control-plane 
        
             kubeadmConfigPatches: 
        
             - | 
        
               kind: ClusterConfiguration 
        
               scheduler: 
        
                 extraArgs: 
        
                   bind-address: "0.0.0.0" 
        
                   port: "10251" 
        
                   secure-port: "10259" 
        
               controllerManager: 
        
                 extraArgs: 
        
                   bind-address: "0.0.0.0" 
        
                   port: "10252" 
        
                   secure-port: "10257" 
        
           EOF 
        
           kubectl cluster-info 
        
           make -C deploy/kubernetes test

blakerouse · 2024-06-25T18:49:49Z

Needing kind installed locally works fine for me. I just believe the integration testing framework should setup a new cluster to perform the work against. Look at code here https://github.com/elastic/elastic-agent/blob/main/pkg/testing/multipass/provisioner.go, this provides an interface to setting up VM's to perform work. We should use the same type of interface for setting up a kind cluster, as this should expand in the future to setup AKS, GKE, etc.

jlind23 · 2024-07-10T14:40:09Z

@cmacknz @blakerouse we should also piggy back on @pkoutsovasilis' PR once merged in order to run OTel Collector test on Kubernetes too.
cc @ycombinator

pkoutsovasilis added the bug Something isn't working label Jun 21, 2024

ycombinator added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jun 21, 2024

pierrehilbert mentioned this issue Jun 25, 2024

[Kubernetes Provider] Fix namespace filter on pod and service watchers #4975

Merged

7 tasks

pkoutsovasilis mentioned this issue Jun 27, 2024

[e2e-tests] feat: add k8s integration tests #5013

Merged

7 tasks

pkoutsovasilis added the Team:Security-Deployment and Devices label Jul 11, 2024

pkoutsovasilis self-assigned this Jul 11, 2024

pkoutsovasilis closed this as completed in #5013 Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes e2e tests #4984

Kubernetes e2e tests #4984

pkoutsovasilis commented Jun 21, 2024 •

edited

Loading

elasticmachine commented Jun 21, 2024

blakerouse commented Jun 24, 2024

cmacknz commented Jun 24, 2024

blakerouse commented Jun 24, 2024

pkoutsovasilis commented Jun 25, 2024 •

edited

Loading

blakerouse commented Jun 25, 2024 •

edited

Loading

pkoutsovasilis commented Jun 25, 2024

cmacknz commented Jun 25, 2024 •

edited

Loading

cmacknz commented Jun 25, 2024

blakerouse commented Jun 25, 2024

jlind23 commented Jul 10, 2024 •

edited

Loading

Kubernetes e2e tests #4984

Kubernetes e2e tests #4984

Comments

pkoutsovasilis commented Jun 21, 2024 • edited Loading

elasticmachine commented Jun 21, 2024

blakerouse commented Jun 24, 2024

cmacknz commented Jun 24, 2024

blakerouse commented Jun 24, 2024

pkoutsovasilis commented Jun 25, 2024 • edited Loading

blakerouse commented Jun 25, 2024 • edited Loading

pkoutsovasilis commented Jun 25, 2024

cmacknz commented Jun 25, 2024 • edited Loading

cmacknz commented Jun 25, 2024

blakerouse commented Jun 25, 2024

jlind23 commented Jul 10, 2024 • edited Loading

pkoutsovasilis commented Jun 21, 2024 •

edited

Loading

pkoutsovasilis commented Jun 25, 2024 •

edited

Loading

blakerouse commented Jun 25, 2024 •

edited

Loading

cmacknz commented Jun 25, 2024 •

edited

Loading

jlind23 commented Jul 10, 2024 •

edited

Loading