-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set SkipDaemonRestart in tryDelayEnroll #4042
Conversation
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
internal/pkg/agent/cmd/run.go
Outdated
// enrollCmd daemonReloadWithBackoff is broken, setting | ||
// SkipDaemonRestart to true avoids running that code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit vague, how is it broken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made an issue and added link to it in the comment.
The test failure is the usual I tested this in an Ubuntu VM and it worked as expected: {"log.level":"info","@timestamp":"2024-01-08T19:29:27.845Z","log.origin":{"file.name":"cmd/run.go","file.line":170},"message":"Elastic Agent started","log":{"source":"elastic-agent"},"process.pid":1773,"agent.version":"8.13.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-08T19:29:27.892Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":496},"message":"Starting enrollment to URL: https://2e6c294352624beda2f6830fada574fb.fleet.eastus2.staging.azure.foundit.no:443/","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-08T19:29:29.308Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":289},"message":"Elastic Agent has been enrolled; start Elastic Agent","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-08T19:29:29.308Z","log.origin":{"file.name":"cmd/run.go","file.line":529},"message":"Successfully performed delayed enrollment of this Elastic Agent.","log":{"source":"elastic-agent"},"ecs.version":"1.6.0"} |
I do think we should add an integration test for this, just not sure if it makes sense as follow up or not. We could modify an existing integration test that enrolls into Fleet to use elastic-agent/testing/integration/logs_ingestion_test.go Lines 82 to 91 in 58ef76d
All we need to do to test this is start using the @leehinman thoughts on adding a test here or is it better as a follow up issue to make sure we get the bug fixed in time for 8.12? |
Let me see if I can get an integration test running by ~noon tomorrow. If not we can merge to make sure we get in before 8.12. |
this skips trying to restart the process which is currently broken because the control daemon hasn't been started yet. This restores the previous behavior where the restart failing wasn't fatal.
c081363
to
88d9c87
Compare
Integration test is in PR now. If you comment out the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test!
buildkite test it |
1 similar comment
buildkite test it |
🤦 |
buildkite test it |
|
Only failure is a known issue, force merging:
|
* set SkipDaemonRestart in tryDelayEnroll this skips trying to restart the process which is currently broken because the control daemon hasn't been started yet. This restores the previous behavior where the restart failing wasn't fatal. * add integration test (cherry picked from commit ba0ea0b)
* set SkipDaemonRestart in tryDelayEnroll this skips trying to restart the process which is currently broken because the control daemon hasn't been started yet. This restores the previous behavior where the restart failing wasn't fatal. * add integration test (cherry picked from commit ba0ea0b) Co-authored-by: Lee E Hinman <57081003+leehinman@users.noreply.github.com>
* set SkipDaemonRestart in tryDelayEnroll this skips trying to restart the process which is currently broken because the control daemon hasn't been started yet. This restores the previous behavior where the restart failing wasn't fatal. * add integration test (cherry picked from commit ba0ea0b) Co-authored-by: Lee E Hinman <57081003+leehinman@users.noreply.github.com>
What does this PR do?
This PR skips trying to restart the
elastic-agent
process during a delayed enroll. Instead the process exits and is restarted by the service manager.This is necessary because the code to restart the
elastic-agent
process requires that the control server be up, and it isn't at this point. This restores previous behavior, where the attempt to restart theelastic-agent
process was non-fatal.Why is it important?
bug that prevents
elastic-agent
from starting when--delay-enroll
is usedChecklist
./changelog/fragments
using the changelog toolAuthor's Checklist
How to test this PR locally
Without PR
--delay-enroll
optionelastic-agent status
will failWith PR
--delay-enroll
optionelastic-agent status
will workRelated issues
--delay-enroll
. #3961Use cases
Screenshots
Logs
Questions to ask yourself