-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows agent gets unhealthy temporarily on updating output to Kafka and then gets Healthy. #6800
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
@muskangulati-qasource Please review. |
Looks like it is coming from the fact that we have the metric monitoring agent unit that goes to unhealthy as it fails fetching data while endpoint is being stopped: `{"log.level":"warn","@timestamp":"2025-02-11T06:31:18.433Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":699},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (HEALTHY->DEGRADED): Error fetching data for metricset http.json: error making http request: Get \"http://npipe/stats\": open \\\\.\\pipe\\Qz3GgkK41DY59a6_d0X9G3K62V5LvsiW.sock: The system cannot find the file specified.","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:19.313Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.newConnInfoServer.func1","file.name":"runtime/conn_info_server.go","file.line":56},"message":"failed accept conn info connection: use of closed network connection","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:31:19.313Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.(*serviceRuntime).stop","file.name":"runtime/service.go","file.line":373},"message":"stopping endpoint service runtime","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:31:19.313Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.(*serviceRuntime).stop","file.name":"runtime/service.go","file.line":389},"message":"endpoint service has checked in, send stopping state to service","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:31:19.313Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.(*serviceRuntime).stop","file.name":"runtime/service.go","file.line":397},"message":"uninstall endpoint service","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.420Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: info: Main.cpp:273 Process machine 0x8664. Native machine 0x8664.","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.420Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: info: Main.cpp:467 Executing uninstall","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.426Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: debug: VaultLib.cpp:207 Vault initialized with existing seed file","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.446Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: debug: VaultLib.cpp:614 Successfully read vault key: config","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.451Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: debug: ECSUtilities.cpp:497 Tamper protection disabled","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.451Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: info: InstallLib.cpp:1173 Skipping uninstall token validation as tamper protection is not enabled.","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.451Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: debug: Service.cpp:814 PPL is supported. This process is unprotected. (TrustLevelSid: absent)","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.452Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: info: Util.cpp:787 Sending service command to facilitate uninstall","context":"command output","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-02-11T06:31:20.534Z","log.logger":"component.runtime.endpoint-default.service_runtime","log.origin":{"function":"github.com/elastic/elastic-agent/pkg/component/runtime.executeCommand.func2","file.name":"runtime/service_command.go","file.line":68},"message":"2025-02-11 06:31:20: info: Util.cpp:814 Service command to facilitate uninstall succeeded","context":"command output","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:31:41.155Z","message":"Non-zero metrics in the last 30s","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.origin":{"file.line":192,"file.name":"log/log.go","function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"},"service.name":"filebeat","monitoring":{"ecs.version":"1.6.0","metrics":{"beat":{"cpu":{"system":{"ticks":562,"time":{"ms":109}},"total":{"ticks":1327,"time":{"ms":203},"value":1327},"user":{"ticks":765,"time":{"ms":94}}},"info":{"ephemeral_id":"30a52344-9f72-4a98-99c2-98825e31b212","uptime":{"ms":362000},"version":"9.0.0-beta1"},"memstats":{"gc_next":75016032,"memory_alloc":67938896,"memory_sys":4456448,"memory_total":128954184,"rss":152424448},"runtime":{"goroutines":60}},"filebeat":{"harvester":{"closed":1,"open_files":2,"running":2,"started":1}},"libbeat":{"config":{"module":{"running":2,"starts":1,"stops":1}},"output":{"events":{"acked":1204,"active":0,"batches":2,"total":1204},"read":{"bytes":997,"errors":2},"write":{"bytes":162075,"latency":{"histogram":{"count":9,"max":752,"mean":316.44444444444446,"median":212,"min":201,"p75":419.5,"p95":752,"p99":752,"p999":752,"stddev":177.6789308531988}}}},"pipeline":{"clients":2,"events":{"active":0,"filtered":34,"published":1203,"total":1237},"queue":{"acked":1204,"added":{"bytes":2208792,"events":1203},"consumed":{"bytes":2210597,"events":1204},"filled":{"bytes":0,"events":0,"pct":0},"max_bytes":0,"max_events":3200,"removed":{"bytes":2210597,"events":1204}}}},"registrar":{"states":{"current":0}},"system":{"handles":{"open":2}}}},"log.logger":"monitoring","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:31:41.363Z","message":"Non-zero metrics in the last 30s","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log":{"source":"beat/metrics-monitoring"},"log.logger":"monitoring","log.origin":{"file.line":192,"file.name":"log/log.go","function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"},"service.name":"metricbeat","monitoring":{"ecs.version":"1.6.0","metrics":{"beat":{"cpu":{"system":{"ticks":312},"total":{"ticks":765,"value":765},"user":{"ticks":453}},"info":{"ephemeral_id":"f3c21b9a-9015-4122-b9ec-da68751ea098","uptime":{"ms":361421},"version":"9.0.0-beta1"},"memstats":{"gc_next":75214048,"memory_alloc":36488056,"memory_total":69299080,"rss":123437056},"runtime":{"goroutines":66}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":6,"starts":3,"stops":3}},"output":{"events":{"acked":3,"active":0,"batches":1,"total":3},"read":{"bytes":397,"errors":1},"write":{"bytes":1810,"latency":{"histogram":{"count":7,"max":301,"mean":200.71428571428572,"median":206,"min":73,"p75":213,"p95":301,"p99":301,"p999":301,"stddev":61.64811267117889}}}},"pipeline":{"clients":6,"events":{"active":0,"published":3,"total":3},"queue":{"acked":3,"added":{"bytes":4609,"events":3},"consumed":{"bytes":4609,"events":3},"filled":{"bytes":0,"events":0,"pct":0},"max_bytes":0,"max_events":3200,"removed":{"bytes":4609,"events":3}}}},"metricbeat":{"beat":{"stats":{"consecutive_failures":3,"events":3,"failures":3}}},"registrar":{"states":{"current":0}}}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:31:41.523Z","message":"Non-zero metrics in the last 30s","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"http/metrics-monitoring","type":"http/metrics"},"log":{"source":"http/metrics-monitoring"},"log.logger":"monitoring","log.origin":{"file.line":192,"file.name":"log/log.go","function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"},"service.name":"metricbeat","monitoring":{"ecs.version":"1.6.0","metrics":{"beat":{"cpu":{"system":{"ticks":375,"time":{"ms":16}},"total":{"ticks":953,"time":{"ms":32},"value":953},"user":{"ticks":578,"time":{"ms":16}}},"info":{"ephemeral_id":"12ee1d09-a525-462b-a209-80cc51d2d542","uptime":{"ms":360871},"version":"9.0.0-beta1"},"memstats":{"gc_next":76344296,"memory_alloc":39419888,"memory_total":77635856,"rss":126078976},"runtime":{"goroutines":86}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":10,"starts":5,"stops":5}},"output":{"events":{"acked":5,"active":0,"batches":1,"total":5},"read":{"bytes":397,"errors":1},"write":{"bytes":2102,"latency":{"histogram":{"count":7,"max":222,"mean":188.28571428571428,"median":203,"min":71,"p75":212,"p95":222,"p99":222,"p999":222,"stddev":48.331339391195634}}}},"pipeline":{"clients":10,"events":{"active":0,"published":5,"total":5},"queue":{"acked":5,"added":{"bytes":7738,"events":5},"consumed":{"bytes":7738,"events":5},"filled":{"bytes":0,"events":0,"pct":0},"max_bytes":0,"max_events":3200,"removed":{"bytes":7738,"events":5}}}},"metricbeat":{"http":{"json":{"consecutive_failures":5,"events":5,"failures":5}}},"registrar":{"states":{"current":0}},"system":{"handles":{"open":-2}}}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:31:42.597Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":699},"message":"Unit state changed http/metrics-monitoring-metrics-monitoring-agent (DEGRADED->HEALTHY): Healthy","log":{"source":"elastic-agent"},"component":{"id":"http/metrics-monitoring","state":"HEALTHY"},"unit":{"id":"http/metrics-monitoring-metrics-monitoring-agent","type":"input","state":"HEALTHY","old_state":"DEGRADED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-02-11T06:32:11.155Z","message":"Non-zero metrics in the last 30s","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"monitoring":{"ecs.version":"1.6.0","metrics":{"beat":{"cpu":{"system":{"ticks":593,"time":{"ms":31}},"total":{"ticks":1421,"time":{"ms":94},"value":1421},"user":{"ticks":828,"time":{"ms":63}}},"info":{"ephemeral_id":"30a52344-9f72-4a98-99c2-98825e31b212","uptime":{"ms":391999},"version":"9.0.0-beta1"},"memstats":{"gc_next":75255016,"memory_alloc":34824344,"memory_total":131475384,"rss":153874432},"runtime":{"goroutines":62}},"filebeat":{"harvester":{"open_files":2,"running":2}},"libbeat":{"config":{"module":{"running":2}},"output":{"events":{"acked":23,"active":0,"batches":2,"total":23},"read":{"bytes":797,"errors":1},"write":{"bytes":6177,"latency":{"histogram":{"count":11,"max":752,"mean":298.54545454545456,"median":212,"min":201,"p75":373,"p95":752,"p99":752,"p999":752,"stddev":165.1959517643784}}}},"pipeline":{"clients":2,"events":{"active":6,"filtered":4,"published":29,"total":33},"queue":{"acked":23,"added":{"bytes":53070,"events":29},"consumed":{"bytes":42117,"events":23},"filled":{"bytes":10953,"events":6,"pct":0.001875},"max_bytes":0,"max_events":3200,"removed":{"bytes":42117,"events":23}}}},"registrar":{"states":{"current":0}},"system":{"handles":{"open":1}}}},"log.logger":"monitoring","log.origin":{"file.line":192,"file.name":"log/log.go","function":"github.com/elastic/beats/v7/libbeat/monitoring/report/log.(*reporter).logSnapshot"},"service.name":"filebeat","ecs.version":"1.6.0"} Looks like some sort of a race condition. Is this specific to 9.0.0 or also reproducible on other releases too? |
Secondary review is Done for this ticket! |
Kibana Build details:
https://staging.elastic.co/9.0.0-beta1-191089b1/summary-9.0.0-beta1.html
Preconditions:
Steps to reproduce:
Note:
Expected Result:
Windows agent should remain healthy on updating output to Kafka.
Logs:
elastic-agent-diagnostics-2025-02-11T06-40-01Z-00.zip
Agent json:
EC2AMAZ-DUAF4CI-agent-details.zip
Screenshot:
The text was updated successfully, but these errors were encountered: