-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows uninstall can fail with Error: failed to remove installation directory
or Access is denied
#3342
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
@manishgupta-qasource Please review. |
Uninstall command issue for BC2 reported at elastic/fleet-server#2935 |
Secondary review for this ticket is Done |
@pierrehilbert @cmacknz worth looking at this. |
@AndersonQ you worked on two PR related to this problem in the past days. Can you see where the problem could come from? |
Hi Team While testing on 8.9.2 BC1 build, we observed this issue reproducible there too. Observations:
Build details: Please let us know if anything else is required from our end. Thanks! |
hey, sorry for the delay. The problem is the same, now at least we know the retry is happening and how it's happening. Even though the logs are cut, the problem seems to be with the log file. I wonder if it's the agent or filebeat which is accessing the logfile. @harshitgupta-qasource do you have those logs? Preferably in text form and in full. This screenshot cut out a rather important part of the error. |
Hi @AndersonQ Thank you for looking into this. Pending Log file in the installed directory: Please let us know if anything else is required from our end. |
@amolnater-qasource another question. Can you easily reproduce this error? If yes, could you explain how? |
@AndersonQ yes, it's reproducible every time.
Thanks! |
then, it's a blocker for the release. |
Not sure what the process is for declaring a release blocker? I assume @cmacknz would know. |
We have already fixed this in the latest BC, we just forgot to update the issue here. @elastic/fleet-qasource-external please re-test this. |
Hi Team, We have revalidated this issue on latest 8.10.0 BC6 kibana cloud environment and had below observations: Observations:
Build details: Hence we are marking this issue as QA:Validated. Thanks! |
While testing the 8.10.3 BC1 build, we have found this issue reproducible there. Observations:
Build details: Hence, we are re-opening this issue. Thanks! |
I'm looking at it |
This suggestion is low-hanging. Can we try it out? |
tldr; The uninstall command (run from outside of the Agent directory) sometimes causes a log file to be created in the root of the agent directory, the contents of the log file are a Docker provider startup failure, this blocks the uninstall from cleaning up the agent directory and fails the uninstall. I do think there are two issues here, one is uninstalling from the directory but there is also this: #3342 (comment) which occurs outside of the directory and I am able to reliably reproduce on GCP. This is also blocking elastic/elastic-stack-installers#220 On GCP i dont see the exact same error but it's the same part that's failing:
I ran process monitor during a failed uninstall which points to the same issue as reported here: #3342 (comment) Looking at the WriteFile event I see a write length of 266 Bytes which implies that Elastic Agent is actually writing something to this log file: This together seems to imply the ![]() Diving into the uninstall code, it looks like when elastic agent is running without a config that sets the logging directory it defaults to the directory the executable is in: elastic-agent/internal/pkg/agent/install/uninstall.go Lines 183 to 187 in d394481
https://github.com/elastic/elastic-agent/blob/main/pkg/core/logger/logger.go#L55-L59 https://github.com/elastic/elastic-agent/blob/main/pkg/core/logger/logger.go#L132-L137 I wrote a watcher script to try to grab the ndjson log that is generated during the uninstall process:
Apparently the Which causes the log file created in |
This might be a good reason to look at logging to the EventLog on Windows instead of to a File. |
We should be logging the output of the uninstall command to the console and not a file. We can log to the event log, but then we'll need to instruct everyone on how to get the events back out of it for troubleshooting. |
I like logging to stdout/stderr, but I really think we should do EventLog too. The default buffer size on cmd.exe is 50 lines, so I really don't want to get in the situation where the debug line we need has rolled off and we don't have "permanent" storage somewhere, even if it is a pain to get at. (although |
My proposal would be:
|
we currently have the progress bar for uninstall on stdout. If we log to stderr this is going to get messy. couple of other options
|
Initial preference is option 2 because it puts the error where we want to see it. I don't have anything against option 1 though. |
So implemented the Observer Output, and it doesn't quite fix the problem, we can't delete the "Agent" directory because we are in the Agent directory. You get the same error on Windows if you try
|
Yeah, that will always be a (different) problem as the command prompt holds a lock on the directory preventing it from being deleted. I previously made reference to this issue containing 2 issues, the first is that you cannot run uninstall if you hold a lock on the root directory, the second is that agent sometimes opens and locks a log file. You're now running into the first issue now that you've resolved the second. The solutions for this that I'm aware of are:
|
@leehinman - does it mean that with the current implementation, if a user calls |
I'm taking "current implementation" to mean without the Observer Output that leaves a log file behind in the agent dir. In that case I think there is a race condition where you could run into the use by another process on the log file it is generating. With the Observer Output, we shouldn't have that race condition. I'm going to try number 3 that @strawgate mentioned. I should be able to detect the current working dir during uninstall, and if that is within the agent install path immediately error out before any actual uninstall activity. That way the user would know immediately that it wouldn't work, and they don't end up with something half uninstalled. Unless there are objections? |
SGTM |
Looks like the right way to move forward here. |
I think I have a preference for this:
i.e. have Agent Uninstall delete all the files but have Windows schedule the deletion of the But am okay with the proposed path forward |
Hi Team, We have revalidated this issue on latest 8.13.0-SNAPSHOT kibana cloud environment and found it fixed now: Observations:
Build details: Hence we are marking this issue as QA:Validated. Thanks! |
Kibana Build details:
Host OS and Browser version: Windows, All
Preconditions:
Steps to reproduce:
.\elastic-agent.exe uninstall
.Error: failed to remove installation directory
on running uninstall command.Expected Result:
No errors should be observed on running agent uninstall command.
Screen Recording:


The text was updated successfully, but these errors were encountered: