-
Notifications
You must be signed in to change notification settings - Fork 452
ti_abusech: Update Fleet status message on API 402 #13718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
Is this restart on DEGRADED status specific to the Agentless environment? IMO, non-fatal statuses should not cause a restart. The Agent should be able to inform users of unhealthy conditions without causing further issues (like a restart loop). A restart should be reserved for true unrecoverable conditions (like deadlock or other unresponsiveness). We can debate whether the rate-limited state should be considered DEGRADED or HEALTHY. However, it would still be valuable to indicate the rate-limited state in Fleet. For example, if input collection will be paused for an hour before resuming, displaying this information on the agent status could be helpful. |
🚀 Benchmarks reportPackage
|
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
scan |
19230.77 | 13698.63 | -5532.14 (-28.77%) | 💔 |
vulnerability |
1945.53 | 1605.14 | -340.39 (-17.5%) | 💔 |
Package ti_abusech
👍(2) 💚(0) 💔(2)
Expand to view
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
malware |
4366.81 | 3571.43 | -795.38 (-18.21%) | 💔 |
malwarebazaar |
5154.64 | 3610.11 | -1544.53 (-29.96%) | 💔 |
To see the full report comment with /test benchmark fullreport
I cannot see how that is happening. It's certainly not expected behaviour for the input. On return of a non-array "events" field we just log, set DEGRADED, raise the object to an array for processing and then drop out of the periodic closure after publication in the normal manner. The only time we exit the periodic loop is when we have a non-nil Go error. This happens when the context is cancelled, when the rate limit response handler errors, or there are various unexpected type errors. It would be good to understand which, if any, of these is the exit path. To understand this, it would be good to see what errors get logged immediately after the DEGRADATION logging. |
Something is off with agentless in Serverless env. I was able to setup Abusech integration earlier. But now I am getting following error: 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.malware-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"}],"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.malware-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"},"status":403} 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.malwarebazaar-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"}],"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.malwarebazaar-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"},"status":403} 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.threatfox-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"}],"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.threatfox-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"},"status":403} 403 Forbidden: {"error":{"root_cause":[{"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.url-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"}],"type":"security_exception","reason":"action [indices:data/read/search] is unauthorized for API key id [xxxxxx] of user [elastic/fleet-server] on indices [agentless-state-cel-ti_abusech.url-501aa6be-4a34-4193-8b99-640449a92134], this action is granted by the index privileges [read,all]"},"status":403} Sharing the agent logs from ECH agentless env. {"log.level":"info","@timestamp":"2025-04-30T16:49:03.363Z","message":"registering","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"id":"cel-ti_abusech.url-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67::https://urlhaus.abuse.ch/downloads/json","key":"cel-ti_abusech_url-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67::https://urlhaus_abuse_ch/downloads/json","uuid":"16c02628-e208-4b43-a8ff-cc3eedad4d8f","ecs.version":"1.6.0","log.logger":"metric_registry","log.origin":{"file.line":63,"file.name":"inputmon/input.go","function":"github.com/elastic/beats/v7/libbeat/monitoring/inputmon.NewInputRegistry"},"service.name":"filebeat","input_type":"cel","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:03.366Z","message":"process repeated request","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"id":"cel-ti_abusech.url-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67","input_url":"https://urlhaus.abuse.ch/downloads/json","ecs.version":"1.6.0","log.origin":{"file.line":225,"file.name":"cel/input.go","function":"github.com/elastic/beats/v7/x-pack/filebeat/input/cel.input.run.func1"},"service.name":"filebeat","input_source":"https://urlhaus.abuse.ch/downloads/json","log.logger":"input.cel","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2025-04-30T16:49:03.608Z","message":"single event object returned by evaluation","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"service.name":"filebeat","id":"cel-ti_abusech.malwarebazaar-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67","input_source":"https://mb-api.abuse.ch/api/v1/","ecs.version":"1.6.0","log.logger":"input.cel","log.origin":{"file.line":407,"file.name":"cel/input.go","function":"github.com/elastic/beats/v7/x-pack/filebeat/input/cel.input.run.func1"},"input_url":"https://mb-api.abuse.ch/api/v1/","event":{"error":{"code":"402","id":"402 Payment Required","message":"POST:{\n \"query_status\": \"ratelimited\",\n \"msg\": \"Your request has been rate-limited. Please visit https:\\/\\/abuse.ch\\/rate-limit\\/ for more information.\"\n}"}},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2025-04-30T16:49:03.615Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":705},"message":"Unit state changed cel-default-cel-ti_abusech-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67 (HEALTHY->DEGRADED): single event error object returned by evaluation: {\"error\":{\"code\":\"402\",\"id\":\"402 Payment Required\",\"message\":\"POST:{\\n \\\"query_status\\\": \\\"ratelimited\\\",\\n \\\"msg\\\": \\\"Your request has been rate-limited. Please visit https:\\\\/\\\\/abuse.ch\\\\/rate-limit\\\\/ for more information.\\\"\\n}\"}}","log":{"source":"elastic-agent"},"component":{"id":"cel-default","state":"HEALTHY"},"unit":{"id":"cel-default-cel-ti_abusech-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67","type":"input","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:04.860Z","message":"add_cloud_metadata: hosting provider type not detected.","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"log.logger":"add_cloud_metadata","log.origin":{"file.line":100,"file.name":"add_cloud_metadata/add_cloud_metadata.go","function":"github.com/elastic/beats/v7/libbeat/processors/add_cloud_metadata.(*addCloudMetadata).init.func1"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:04.899Z","message":"Connecting to backoff(elasticsearch(https://0c0a9b03c334490ebd8bd527d95014f8.us-central1.gcp.cloud.es.io:443))","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"publisher_pipeline_output","log.origin":{"file.line":138,"file.name":"pipeline/client_worker.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:05.008Z","message":"Attempting to connect to Elasticsearch version 8.18.0 (default)","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"service.name":"filebeat","ecs.version":"1.6.0","log.logger":"esclientleg","log.origin":{"file.line":323,"file.name":"eslegclient/connection.go","function":"github.com/elastic/beats/v7/libbeat/esleg/eslegclient.(*Connection).Ping"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:05.324Z","message":"Connection to backoff(elasticsearch(https://0c0a9b03c334490ebd8bd527d95014f8.us-central1.gcp.cloud.es.io:443)) established","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":146,"file.name":"pipeline/client_worker.go","function":"github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:30.711Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":687},"message":"Component state changed cel-default (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '28' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"cel-default","state":"STOPPED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:30.712Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":705},"message":"Unit state changed cel-default-cel-ti_abusech-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67 (DEGRADED->STOPPED): Suppressing FAILED state due to restart for '28' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"cel-default","state":"STOPPED"},"unit":{"id":"cel-default-cel-ti_abusech-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67","type":"input","state":"STOPPED","old_state":"DEGRADED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:30.712Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":705},"message":"Unit state changed cel-default (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '28' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"cel-default","state":"STOPPED"},"unit":{"id":"cel-default","type":"output","state":"STOPPED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:31.714Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":667},"message":"Spawned new component cel-default: Starting: spawned pid '68'","log":{"source":"elastic-agent"},"component":{"id":"cel-default","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:31.714Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":674},"message":"Spawned new unit cel-default-cel-ti_abusech-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67: Starting: spawned pid '68'","log":{"source":"elastic-agent"},"component":{"id":"cel-default","state":"STARTING"},"unit":{"id":"cel-default-cel-ti_abusech-1c57b81b-ee89-4fe9-ad66-bc5dd6af7b67","type":"input","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:31.714Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":674},"message":"Spawned new unit cel-default: Starting: spawned pid '68'","log":{"source":"elastic-agent"},"component":{"id":"cel-default","state":"STARTING"},"unit":{"id":"cel-default","type":"output","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:31.809Z","message":"Home path: [/usr/share/elastic-agent/data/elastic-agent-1c9cf2/components] Config path: [/usr/share/elastic-agent/data/elastic-agent-1c9cf2/components] Data path: [/agentless/data/run/cel-default] Logs path: [/usr/share/elastic-agent/data/elastic-agent-1c9cf2/components/logs]","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"log.origin":{"file.line":1082,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2025-04-30T16:49:31.810Z","message":"Beat ID: f2f6a96b-e3c6-490b-9722-453a4c35a4a9","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"cel-default","type":"cel"},"log":{"source":"cel-default"},"log.origin":{"file.line":1090,"file.name":"instance/beat.go","function":"github.com/elastic/beats/v7/libbeat/cmd/instance.(*Beat).configure"},"service.name":"filebeat","ecs.version":"1.6.0","ecs.version":"1.6.0"} Right after the unit goes from |
From the logs, the cel-default input is (repeatedly) started and then terminates with a -1 with no additional logging to explain why. It might be helpful to set logging to debug to get some breadcrumbs. My bet would be on an OoMKill given the absence of any logging of errors that would be returned by the eval loop in order to exit. |
I'm also thinking OOM killer. In our demo cluster the I did wonder if the agentless Kubernetes liveness probe could be the cause, but that can be ruled out because it's not the whole Agent process group that is exiting, it's a specific sub-process of the Agent. |
It is a oom-kill. I don't think we should hide the degraded state. Instead I think the integration should make it clear via the status message that they are rate-limited and that they need to authenticate to AbuseCH. Auth will become mandatory on June 30, 2025. And maybe we should point them at https://abuse.ch/blog/community-first/ . |
Adding diagnostics in ECH agentless environment (with debug logs):
|
Thanks @andrewkroh. I will update the But due to the restarts, the users will face the issue with billing (indexing spike) just like SDH. Is there anyway we can handle that? |
Updated the |
|
💚 Build Succeeded
History
cc @kcreddy |
Broader topic; we used to see throws for this in the agent logs, but we don't seem to now. Why is this? This is a significant visibility hole. |
PR: #13760 to increase the memory on the pod as per @andrewkroh suggestion in the SDH. |
Proposed commit message
Checklist
changelog.yml
file.Screenshots
Before :
Input Health

Documents indexed

After (current PR):
Input Health

Documents indexed
