Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing .installed file prevents uninstall or upgrade #4051

Closed
fearful-symmetry opened this issue Jan 9, 2024 · 11 comments · Fixed by #4172
Closed

Missing .installed file prevents uninstall or upgrade #4051

fearful-symmetry opened this issue Jan 9, 2024 · 11 comments · Fixed by #4172
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@fearful-symmetry
Copy link
Contributor

While trying to fix a broken install on a test machine, an attempt at running elastic-agent install failed, and a following uninstall command failed:

sudo /usr/bin/elastic-agent uninstall                                                                                                                                                                       1 ↵
[sudo] password for alexk: 
Error: can only be uninstalled by executing the installed Elastic Agent at: /usr/bin/elastic-agent
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.13/fleet-troubleshooting.html

There's two problems here:

  1. ) we can't deal with partial/damaged installs. In this case, the error happened because the .installed file in the root directory is missing.
  2. The can only be uninstalled by executing the installed Elastic Agent at: /usr/bin/elastic-agent message is a bit deceptive. It doesn't actually care about what path you're running the binary from, but instead it's checking the validity of the install:
	if status == install.Installed && !info.RunningInstalled() {
		return fmt.Errorf("can only be uninstalled by executing the installed Elastic Agent at: %s", install.ExecutablePath(paths.Top()))
	}

We should print a message that more accurately describes why we bailed out of the uninstall process.

@fearful-symmetry fearful-symmetry added the bug Something isn't working label Jan 9, 2024
@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Jan 10, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pierrehilbert
Copy link
Contributor

pierrehilbert commented Jan 10, 2024

EDIT: It seems that it has been introduced by #2500
@ycombinator should we just highlight the real problem to uninstall here or do we have a way to fix the rootcause?

@cmacknz
Copy link
Member

cmacknz commented Jan 10, 2024

Missing the .installed file also marks the agent as unupgradable which makes this a more severe problem. We have seen internal InfoSec agents hit this problem but we haven't been able to explain why.

// IsUpgradeable when agent is installed and running as a service or flag was provided.
func IsUpgradeable() bool {
// only upgradeable if running from Agent installer and running under the
// control of the system supervisor (or built specifically with upgrading enabled)
return release.Upgradeable() || (paths.RunningInstalled() && info.RunningUnderSupervisor())
}

// RunningInstalled returns true when executing Agent is the installed Agent.
func RunningInstalled() bool {
// Check if install marker created by `elastic-agent install` exists
markerFilePath := filepath.Join(Top(), MarkerFileName)
if _, err := os.Stat(markerFilePath); err != nil {
return false
}
return true
}

We should consider relaxing the Upgradable check not to depend on being installed unless we can find the root cause. I'd rather have people accidentally upgrade agents that aren't installed rather than not be able to upgrade installed agents that should have been upgradable.

@cmacknz
Copy link
Member

cmacknz commented Jan 10, 2024

For the problem Alex is describing we may be able to improve this by just creating the .installed marker as early as we can.

@cmacknz cmacknz changed the title Failed installs can't be uninstalled due to missing .installed file, agent reports deceptive error message Missing .installed file prevents uninstall or upgrade Jan 12, 2024
@leandrojmp
Copy link
Contributor

Hello,

Was this backported to 8.12? We are hitting this issue on 8.12.1 where we cannot uninstall an Agent nor force install it again.

What is the workaround to be able to uninstall the Agent? Can we just stop/kill the service, remove the folder and try to install it again?

@cmacknz
Copy link
Member

cmacknz commented Mar 4, 2024

It was not backported to 8.12 as there isn't a planned 8.12.3 release (at least not yet).

What is the workaround to be able to uninstall the Agent? Can we just stop/kill the service, remove the folder and try to install it again?

The easiest fix would be to create the .installed file again at the root of the agent installation directory. Then the elastic-agent command should work again. The location for each platform (unless you customized it) is:

  • /opt/Elastic/Agent/.installed on Linux
  • /Library/Elastic/Agent/.installed on Mac
  • C:\Program Files\Elastic\Agent\.installed on Windows

We were unable to identify any mechanism by which this file could be missing besides the installation process being interrupted. If you know how you got into this situation that would be valuable to know.

Can we just stop/kill the service, remove the folder and try to install it again?

This would also work, unless you installed Elastic Defend, and then you additionally need to use the endpoint-security binary to uninstall the separate service implementing Elastic Defend.

@leandrojmp
Copy link
Contributor

leandrojmp commented Mar 4, 2024

Hello @cmacknz ,

We've tried already to add the .installed file inside C:\Program Files\Elastic\Agent\, but got other errors that does not help much, like these ones related to some permissions issues, but everything was executed on a PowerShell with admin privileges, maybe something else is also broken.

{debug 2024-03-04 19:06:48.6918001 +0000 GMT m=+0.203164001 processes Non fatal error fetching PID some info for 892, metrics are valid, but partial: FillMetricsRequiringMoreAccess: error fetching process args: Not enough privileges to fetch information: OpenProcess failed: Access is denied. github.com/elastic/elastic-agent-system-metrics@v0.9.1/metric/system/process/process.go:176 }
{debug 2024-03-04 19:06:48.6953158 +0000 GMT m=+0.206679701 processes Non fatal error fetching PID some info for 812, metrics are valid, but partial: FillMetricsRequiringMoreAccess: error fetching process args: Not enough privileges to fetch information: OpenProcess failed: Access is denied. github.com/elastic/elastic-agent-system-metrics@v0.9.1/metric/system/process/process.go:176 }
{debug 2024-03-04 19:06:48.7149163 +0000 GMT m=+0.226280201 processes Non fatal error fetching PID some info for 7532, metrics are valid, but partial: FillMetricsRequiringMoreAccess: error fetching process args: Not enough privileges to fetch information: OpenProcess failed: Access is denied. github.com/elastic/elastic-agent-system-metrics@v0.9.1/metric/system/process/process.go:176 }
Error: error uninstalling agent: error uninstalling components: error loading agent config: could not initialize config store: failed to ensure key during encrypted disk store Load: could not get agent key: procdecryptdata: Key not valid for use in specified state.

We opened a ticket on support asking for the workaround on this issue.

We are not using Elastic Defend, so I will try to ask the infra team to remove the service and the folder and then try to install again.

I'm not sure what got our agents in this states, but the installation is being done by some automation, a couple of the agents that had this issue had some network problem were they could not enroll with fleet, and then got in this state, the Elastic Agent service is not running and cannot be started as well.

In resume, the agents that weren't able to enroll on fleet got stuck on this state where we cannot uninstall nor force install them.

@cmacknz
Copy link
Member

cmacknz commented Mar 5, 2024

Error: error uninstalling agent: error uninstalling components: error loading agent config: could not initialize config store: failed to ensure key during encrypted disk store Load: could not get agent key: procdecryptdata: Key not valid for use in specified state.

Hmm, the encrypted vault seems to have gotten into a bad state. I've never seen this happen before. We must have been interrupted while setting it up and the code doesn't handle this situation properly.

You could try running elastic-agent install --force to install agent over the broken installation to see if it can get you past this, then uninstall.

@leandrojmp
Copy link
Contributor

Hello @cmacknz

We tried to use install --force but it didn't work because it tried to uninstall it, but the uninstall is not working.

This error also happened during the enroll process when we tried again after removing the Elastic folder and the Elastic Agent service.

For some reason we were not able to install the Agent on a couple of servers, it enters in a broken install state.

We are automating this installation, the elastic agent installer is executed from a network share, this avoid having to copy the installer to hundreds of machines, unpack it and then have the installer copy itself to the destination path.

This worked for almos 50 servers, but for 2 of them both agents entered on a broken install state, as a last resort we tried to install it directly from the host, not using the automation script the uses the network installer and this time it worked.

I have an open ticket with support, being able to install from network without issues like that is a requirement, if you want I can provide the logs we saved from our last atempt.

@cmacknz
Copy link
Member

cmacknz commented Mar 7, 2024

Error: error uninstalling agent: error uninstalling components: error loading agent config: could not initialize config store: failed to ensure key during encrypted disk store Load: could not get agent key: procdecryptdata: Key not valid for use in specified state.

@aleksmaus do you have any idea how we could recover from this situation where the agent key for the vault is somehow missing?

@leandrojmp I see your support ticket has made its way to engineering now as well with another one of our engineers assigned to investigate.

@aleksmaus
Copy link
Contributor

aleksmaus commented Mar 7, 2024

@aleksmaus do you have any idea how we could recover from this situation where the agent key for the vault is somehow missing?

Manual removal is always an option.
I briefly glanced over the comments, in the beginning of this ticket there was a mention that some pathing got changed. All the "vault" files should be created on the machine the first time the agent starts as far as remember. Do the vault files exist on that system? Are they available or corrupted? Did whatever the path Agent used for this case exist, was it writable?

I did a quick test:

  1. Installed the agent
  2. Stopped the agent service
  3. Removed the vault directory
  4. Confirmed that uninstall fails due to enc keys mismatch
  5. Removed "vault" directory, "fleet.enc" and "fleet.enc.lock" files
  6. Ran sudo elastic-agent uninstall. It succeeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants