[Kubernetes manifest] Use unique identifier for the state file path #5187

tetianakravchenko · 2024-07-23T09:38:19Z

Relates Elastic Agent doesn't update the enrollment token in Kubernetes Deployment statefile #3586

Describe the enhancement:

in manifest we have elastic-agent-state and the hostPath is predefined:

        # Mount /var/lib/elastic-agent-managed/kube-system/state to store elastic-agent state
        # Update 'kube-system' with the namespace of your agent installation
        - name: elastic-agent-state
          hostPath:
            path: /var/lib/elastic-agent-managed/kube-system/state
            type: DirectoryOrCreate

as a result when customer want to remove installation kubectl delete -f manifest.yaml and install a new one (with the different FLEET_URL and FLEET_ENROLLMENT_TOKEN) existing state file will be used, that leads to the next error:

"message":"Possible transient error during checkin with fleet-server, retrying","log":{"source":"elastic-agent"},"error":{"message":"fail to checkin to fleet-server: all hosts failed: 1 error occurred:\n\t* requester 0/1 to host https://XXXXXX.fleet.region.aws.found.io:443 ...

What is the definition of done?

create 2 elastic stack deployments: stack1, stack 2
install elastic-agent to the k8s cluster (with stack1 credentials)
delete it
install elastic-agent to the k8s cluster (with stack2 credentials)
no errors occure

Few ideas:
we can use fleet url as: /var/lib/elastic-agent-managed/<fleet_url>/kube-system/state (like: /var/lib/elastic-agent-managed/f437b90409bb4804b1647665fa19f7a0.fleet.us-central1.gcp.cloud.es.io/kube-system/state, for local setup: /var/lib/elastic-agent-managed/fleet-serverkube-system/state)
but what to do we there is no fleet server? fallback to default - /var/lib/elastic-agent-managed/kube-system/state ?

The text was updated successfully, but these errors were encountered:

cmacknz · 2024-07-23T16:39:58Z

Relates Elastic Agent doesn't update the enrollment token in Kubernetes Deployment statefile #3586

I think we need to treat a change in the FLEET_URL or FLEET_ENROLLMENT_TOKEN environment variables as equivalent to executing the elastic-agent enroll command.

blakerouse · 2024-07-23T19:34:06Z

@cmacknz I disagree, there are many reasons you might change those values after the Elastic Agent is already running and you don't what to have your Elastic Agents to re-enroll. Say you are updating the FLEET_URL because you just moved the cluster, or you just updated the FLEET_ENROLLMENT_TOKEN as a security policy of rotating tokens periodically.

Would be interesting to see if we could possibly make an anonymous call to Fleet Server and determine if this is the same Fleet Server?

cmacknz · 2024-07-23T19:45:56Z

Would be interesting to see if we could possibly make an anonymous call to Fleet Server and determine if this is the same Fleet Server?

Is just checking in, or doing anything that uses the stored API key enough to check this?

We could make calling the enroll endpoint idempotent in some situations, perhaps by allowing an optional agent.id parameter. This would allow getting the API key of an existing agent, instead of a net new one though which I don't love from a security perspective (edit: or the response could just not include the existing API key so that this is only an "is an agent with this ID enrolled" check).

blakerouse · 2024-07-23T19:49:32Z

Would be interesting to see if we could possibly make an anonymous call to Fleet Server and determine if this is the same Fleet Server?

Is just checking in, or doing anything that uses the stored API key enough to check this?

We could make calling the enroll endpoint idempotent in some situations, perhaps by allowing an optional agent.id parameter. This would allow getting the API key of an existing agent, instead of a net new one though which I don't love from a security perspective (edit: or the response could just not include the existing API key so that this is only an "is an agent with this ID enrolled" check).

@cmacknz I like the idempotent idea. We could just change it to return a HTTP conflict or specific response saying that it already exists and not return the API key again.

blakerouse · 2024-07-26T14:49:30Z

I just wanted to add a note here that if you set FLEET_FORCE=true in environment for the container that it will re-enroll on every restart. This doesn't actually solve this issue, but is a workaround when you are trying to migrate from one Fleet to another Fleet.

tetianakravchenko added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jul 23, 2024

cmacknz mentioned this issue Jul 23, 2024

Elastic Agent doesn't update the enrollment token in Kubernetes Deployment statefile #3586

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kubernetes manifest] Use unique identifier for the state file path #5187

[Kubernetes manifest] Use unique identifier for the state file path #5187

tetianakravchenko commented Jul 23, 2024 •

edited by cmacknz

Loading

cmacknz commented Jul 23, 2024 •

edited

Loading

blakerouse commented Jul 23, 2024

cmacknz commented Jul 23, 2024 •

edited

Loading

blakerouse commented Jul 23, 2024

blakerouse commented Jul 26, 2024

[Kubernetes manifest] Use unique identifier for the state file path #5187

[Kubernetes manifest] Use unique identifier for the state file path #5187

Comments

tetianakravchenko commented Jul 23, 2024 • edited by cmacknz Loading

cmacknz commented Jul 23, 2024 • edited Loading

blakerouse commented Jul 23, 2024

cmacknz commented Jul 23, 2024 • edited Loading

blakerouse commented Jul 23, 2024

blakerouse commented Jul 26, 2024

tetianakravchenko commented Jul 23, 2024 •

edited by cmacknz

Loading

cmacknz commented Jul 23, 2024 •

edited

Loading

cmacknz commented Jul 23, 2024 •

edited

Loading