-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent_state_summary plan - null report_timestamp for node(s) fails the plan #236
Labels
bug
Something isn't working
Comments
@bastelfreak --^ |
ah nice, I will try to take a look this week. |
bastelfreak
added a commit
to bastelfreak/puppetlabs-pe_status_check
that referenced
this issue
Sep 30, 2024
…nhealthy It's possible that a Puppet Agent was stopped or disabled and all old reports were garbage collected from PuppetDB. The node still exists in PuppetDB, but when checking for a report the timestamp is null: ``` ``` ```json [ { "certname": "pe.tim.local", "report_timestamp": "2024-09-30T13:21:17.042Z" }, { "certname": "pe2.tim.local", "report_timestamp": null } ] ``` Previously we always assumed that `report_timestamp` has a valid timestamp. With this patch we explicitly validate the timestamp and count nodes withhout a timestamp as unhealthy.
bastelfreak
added a commit
to bastelfreak/puppetlabs-pe_status_check
that referenced
this issue
Sep 30, 2024
…is null The attribute `cached_catalog_status` on the `/nodes` endpoint will be `null` for nodes without a catalog in PuppetDB: ``` puppet query 'nodes[certname,latest_report_noop,latest_report_corrective_change,cached_catalog_status,latest_report_status,report_timestamp]{}' ``` ``` [ { "cached_catalog_status": null, "certname": "pe2.tim.betadots.training", "latest_report_corrective_change": null, "latest_report_noop": null, "latest_report_status": null, "report_timestamp": null } ] ```
bastelfreak
added a commit
to bastelfreak/puppetlabs-pe_status_check
that referenced
this issue
Sep 30, 2024
…nhealthy It's possible that a Puppet Agent was stopped or disabled and all old reports were garbage collected from PuppetDB. The node still exists in PuppetDB, but when checking for a report the timestamp is null: ``` puppet query nodes[certname,report_timestamp]{} ``` ```json [ { "certname": "pe.tim.local", "report_timestamp": "2024-09-30T13:21:17.042Z" }, { "certname": "pe2.tim.local", "report_timestamp": null } ] ``` Previously we always assumed that `report_timestamp` has a valid timestamp. With this patch we explicitly validate the timestamp and count nodes withhout a timestamp as unhealthy. Now with the fix: ``` puppet plan run pe_status_check::agent_state_summary --environment peadm log_healthy_nodes=true log_unhealthy_nodes=true ``` ```json { "responsive": [ "pe.tim.local", "pe2.tim.local" ], "healthy_counter": 0, "total_counter": 2, "unhealthy_counter": 2, "noop": [], "unhealthy": [ "pe2.tim.local", "pe.tim.local" ], "healthy": [], "changed": [ "pe.tim.local" ], "no_report": [ "pe.tim.local" ], "corrective_changes": [], "used_cached_catalog": [ "pe2.tim.local" ], "unresponsive": [], "failed": [] } ```
bastelfreak
added a commit
to bastelfreak/puppetlabs-pe_status_check
that referenced
this issue
Oct 7, 2024
…nhealthy It's possible that a Puppet Agent was stopped or disabled and all old reports were garbage collected from PuppetDB. The node still exists in PuppetDB, but when checking for a report the timestamp is null: ``` puppet query nodes[certname,report_timestamp]{} ``` ```json [ { "certname": "pe.tim.local", "report_timestamp": "2024-09-30T13:21:17.042Z" }, { "certname": "pe2.tim.local", "report_timestamp": null } ] ``` Previously we always assumed that `report_timestamp` has a valid timestamp. With this patch we explicitly validate the timestamp and count nodes withhout a timestamp as unhealthy. Now with the fix: ``` puppet plan run pe_status_check::agent_state_summary --environment peadm log_healthy_nodes=true log_unhealthy_nodes=true ``` ```json { "responsive": [ "pe.tim.local", "pe2.tim.local" ], "healthy_counter": 0, "total_counter": 2, "unhealthy_counter": 2, "noop": [], "unhealthy": [ "pe2.tim.local", "pe.tim.local" ], "healthy": [], "changed": [ "pe.tim.local" ], "no_report": [ "pe.tim.local" ], "corrective_changes": [], "used_cached_catalog": [ "pe2.tim.local" ], "unresponsive": [], "failed": [] } ```
bastelfreak
added a commit
to bastelfreak/puppetlabs-pe_status_check
that referenced
this issue
Oct 7, 2024
…nhealthy It's possible that a Puppet Agent was stopped or disabled and all old reports were garbage collected from PuppetDB. The node still exists in PuppetDB, but when checking for a report the timestamp is null: ``` puppet query nodes[certname,report_timestamp]{} ``` ```json [ { "certname": "pe.tim.local", "report_timestamp": "2024-09-30T13:21:17.042Z" }, { "certname": "pe2.tim.local", "report_timestamp": null } ] ``` Previously we always assumed that `report_timestamp` has a valid timestamp. With this patch we explicitly validate the timestamp and count nodes withhout a timestamp as unhealthy. Now with the fix: ``` puppet plan run pe_status_check::agent_state_summary --environment peadm log_healthy_nodes=true log_unhealthy_nodes=true ``` ```json { "responsive": [ "pe.tim.local", "pe2.tim.local" ], "healthy_counter": 0, "total_counter": 2, "unhealthy_counter": 2, "noop": [], "unhealthy": [ "pe2.tim.local", "pe.tim.local" ], "healthy": [], "changed": [ "pe.tim.local" ], "no_report": [ "pe.tim.local" ], "corrective_changes": [], "used_cached_catalog": [ "pe2.tim.local" ], "unresponsive": [], "failed": [] } ```
MartyEwings
added a commit
that referenced
this issue
Nov 18, 2024
(#236) agent_state_summary: Count nodes without report as unhealthy
MartyEwings
added a commit
that referenced
this issue
Nov 18, 2024
(#236) agent_state_summary: Check if cached_catalog_status is null
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the Bug
If a node returned in the PuppetDB query has a null report_timestamp, the plan fails at line 23
Expected Behavior
A node with a null report_timestamp shouldn't fail the plan. Since the plan operationalizes unhealthy nodes as "latest reports that are older than runinterval", the node's with a null report_timestamp can be added to the unhealthy list.
Steps to Reproduce
Provision a node and request a cert. Submit a node entry to PuppetDB without submitting a report
Query PuppetDB to confirm that report_timestamp is null
Then, run the plan pe_status_check::agent_state_summary
Environment
Additional Context
Another edge case that would result in a null report_timestamp is if a resource that contains a null byte preventing PuppetDB from ever storing a report from a node
The text was updated successfully, but these errors were encountered: