Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#236) agent_state_summary: Count nodes without report as unhealthy #238

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

bastelfreak
Copy link
Contributor

It's possible that a Puppet Agent was stopped or disabled and all old reports were garbage collected from PuppetDB. The node still exists in PuppetDB, but when checking for a report the timestamp is null:

puppet query nodes[certname,report_timestamp]{}
[
  {
    "certname": "pe.tim.local",
    "report_timestamp": "2024-09-30T13:21:17.042Z"
  },
  {
    "certname": "pe2.tim.local",
    "report_timestamp": null
  }
]

Previously we always assumed that report_timestamp has a valid timestamp. With this patch we explicitly validate the timestamp and count nodes withhout a timestamp as unhealthy.

Now with the fix:

puppet plan run pe_status_check::agent_state_summary --environment peadm log_healthy_nodes=true log_unhealthy_nodes=true
{
    "responsive": [
        "pe.tim.local",
        "pe2.tim.local"
    ],
    "healthy_counter": 0,
    "total_counter": 2,
    "unhealthy_counter": 2,
    "noop": [],
    "unhealthy": [
        "pe2.tim.local",
        "pe.tim.local"
    ],
    "healthy": [],
    "changed": [
        "pe.tim.local"
    ],
    "no_report": [
        "pe.tim.local"
    ],
    "corrective_changes": [],
    "used_cached_catalog": [
        "pe2.tim.local"
    ],
    "unresponsive": [],
    "failed": []
}

Please check off the steps below as you complete each step

  • Put the Jira ticket or Github issue number in parentheses in the Title e.g. (SUP-XXXX) Add Super Duper State Check
  • Update the Jira ticket status to Ready for Review if there is one
  • Review any CI failures and fix issues

@bastelfreak bastelfreak requested a review from a team as a code owner September 30, 2024 14:16
@bastelfreak
Copy link
Contributor Author

pe2.tim.local is listed here as used_cached_catalog. That's another bug, fixed in #237

…nhealthy

It's possible that a Puppet Agent was stopped or disabled and all old
reports were garbage collected from PuppetDB. The node still exists in
PuppetDB, but when checking for a report the timestamp is null:

```
puppet query nodes[certname,report_timestamp]{}
```

```json
[
  {
    "certname": "pe.tim.local",
    "report_timestamp": "2024-09-30T13:21:17.042Z"
  },
  {
    "certname": "pe2.tim.local",
    "report_timestamp": null
  }
]
```

Previously we always assumed that `report_timestamp` has a valid
timestamp. With this patch we explicitly validate the timestamp and
count nodes withhout a timestamp as unhealthy.

Now with the fix:

```
puppet plan run pe_status_check::agent_state_summary --environment peadm log_healthy_nodes=true log_unhealthy_nodes=true
```

```json
{
    "responsive": [
        "pe.tim.local",
        "pe2.tim.local"
    ],
    "healthy_counter": 0,
    "total_counter": 2,
    "unhealthy_counter": 2,
    "noop": [],
    "unhealthy": [
        "pe2.tim.local",
        "pe.tim.local"
    ],
    "healthy": [],
    "changed": [
        "pe.tim.local"
    ],
    "no_report": [
        "pe.tim.local"
    ],
    "corrective_changes": [],
    "used_cached_catalog": [
        "pe2.tim.local"
    ],
    "unresponsive": [],
    "failed": []
}
```
@taikaa
Copy link
Contributor

taikaa commented Oct 28, 2024

@bastelfreak apologies for the delay to review the PR. I tested the PR and no longer get the error. Thanks for adding this

@taikaa
Copy link
Contributor

taikaa commented Nov 1, 2024

@MartyEwings hello are these failed tests alright to merge this PR? Thank you!

@bastelfreak
Copy link
Contributor Author

Because nobody is reviewing this I raised support ticket #01302632.

@MartyEwings MartyEwings added the enhancement New feature or request label Nov 18, 2024
@MartyEwings MartyEwings merged commit d549e3e into puppetlabs:main Nov 18, 2024
14 of 31 checks passed
@bastelfreak
Copy link
Contributor Author

@MartyEwings thanks for merging! Would it be possible to get a new release?

@bastelfreak bastelfreak deleted the foo branch December 5, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants