OMD 5.10 shows very different gearman worker status #157

infraweavers · 2023-03-02T08:06:50Z

Hello,

So since our OMD 4.40 -> OMD 5.10 upgrade we've been experiencing occasions where our gearman server appears to have large numbers of running or waiting checks. On investigation we can see that the behaviour of service checks through gearman is very different under OMD 5.10. In order to do some diagnostics we've downgraded one of our OMD boxes to OMD 4.60; but we have "transplanted" the version of mod_gearman_worker-go and the epn into the 4.60 box so we're not running into ConSol-Monitoring/mod-gearman-worker-go#19 This has the added benefit of exonerating mod_gearman_worker-go which is nice. I'm leaning towards there being a change in naemon-core.

OMD config:

    omd config set GEARMAND on
    omd config set GEARMAND_PORT 0.0.0.0:4730
    omd config set GEARMAN_WORKER on
    omd config set LIVESTATUS_TCP on
    omd config set LIVESTATUS_TCP_PORT 6557
    omd config set MOD_GEARMAN on
    omd config set PNP4NAGIOS gearman
    omd config set THRUK_COOKIE_AUTH off
    omd config set GRAFANA on

Graph of /omd/sites/default/lib/monitoring-plugins/check_gearman -H OMD101.man.cwserverfarm.local -W 501 -C 750 -w 501 -c 750 where we can see the differing behaviour.

The text was updated successfully, but these errors were encountered:

infraweavers · 2023-03-02T10:07:58Z

The Load Average (not that it means much) is also significantly higher under 5.10.

I'll keep digging and see what else shows up. We did notice that the core scheduling graph also looks "wierd" under 5.10 compared to 4.60 (like much spiker and not as even etc) however it's difficult to get a side-by-side comparison on that. I'll see what turns up

sni · 2023-03-02T11:30:17Z

try disabling embedded perl in the etc/mod-gearman/worker.cfg. I noticed an issue yesterday in the epn connector if the plugin output exceeds 8kb.

infraweavers · 2023-03-02T15:11:56Z

try disabling embedded perl in the etc/mod-gearman/worker.cfg. I noticed an issue yesterday in the epn connector if the plugin output exceeds 8kb.

Cool, we'll give that a shot on an un-touched 5.10

sni · 2023-03-02T15:12:40Z

yeah, but wait till tomorrow, still working on that fix.

infraweavers · 2023-03-03T07:53:57Z

Hmm, I disabled embedded perl yesterday (about where the red line is); can't really see a difference so far:

sni · 2023-03-03T08:38:24Z

todays daily looks fine. epn should run much smoother now.

infraweavers · 2023-03-03T08:54:03Z

Cool, I'll build one of our boxes onto that and give it a test

infraweavers · 2023-03-06T12:38:32Z

Hmm, I would say it doesn't look massively different at "big scale":

On the 1 week scale you can see where we upgraded to the nightly build (red line), it does arguably look a little bit better maybe?

infraweavers · 2023-03-09T14:06:00Z

OK so we've downgraded one of them to OMD4.60 as well to see if we can narrow it down. It looks like the change in behaviour is between 4.60 and 5.10

sni · 2023-03-10T16:58:17Z

could you try the latest OMD daily, it should work quite well now. I also added something in the gearman neb module to flatten out the number of concurrent started checks.

infraweavers · 2023-03-10T17:06:16Z

could you try the latest OMD daily, it should work quite well now. I also added something in the gearman neb module to flatten out the number of concurrent started checks.

Yep we'll do that on Monday

infraweavers · 2023-03-14T13:50:32Z

We've just rolled out omd-5.11.20230314-labs-edition onto one of the servers to test that now

infraweavers · 2023-03-17T09:43:41Z

So from what we can see, it seems to be improved but not really back to where it was in 4.60. I think we will have to increase the workers to see if that will remove some of the noise and pressure that we're seeing. We do also keep getting pnp4nagios errors with the interval being too short between updates (similiar to #156 but for other checks, we have decreased the pnp_gearman_worker down to 1 to eliminate a race condition there and it still does it, so we're thinking that something is running the same check back-to-back as it were).

This sort of feels to us that check's aren't being run at regular intervals under 5+ (most of our checks are once per minute). We're going to investigate if we have evidence to support that assertion, but it certainly feels like that's what's going on.

infraweavers · 2023-03-17T16:21:01Z

SO we looked into the naemon suspicions there and have found absolutely no evidence to support the idea that checks are being run more frequently than they should be. So, we have bumped our thresholds up from 500 to 2500 for the time being whilst we try and ascertain if the change is actually a problem for gearman/OMD etc or not

sni · 2023-06-22T14:25:23Z

btw, load average might seem to increase if you use the check_load scaled by cpu mode. The check_load now has a scaled_load perf counter and the previous "scaled" metric is the absolute unit now. So it might be, that the cpu usage did not increase at all, but the check_load check
now reports different numbers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OMD 5.10 shows very different gearman worker status #157

OMD 5.10 shows very different gearman worker status #157

infraweavers commented Mar 2, 2023

infraweavers commented Mar 2, 2023

sni commented Mar 2, 2023

infraweavers commented Mar 2, 2023

sni commented Mar 2, 2023

infraweavers commented Mar 3, 2023

sni commented Mar 3, 2023

infraweavers commented Mar 3, 2023

infraweavers commented Mar 6, 2023

infraweavers commented Mar 9, 2023

sni commented Mar 10, 2023

infraweavers commented Mar 10, 2023

infraweavers commented Mar 14, 2023

infraweavers commented Mar 17, 2023 •

edited

Loading

infraweavers commented Mar 17, 2023

sni commented Jun 22, 2023

OMD 5.10 shows very different gearman worker status #157

OMD 5.10 shows very different gearman worker status #157

Comments

infraweavers commented Mar 2, 2023

infraweavers commented Mar 2, 2023

sni commented Mar 2, 2023

infraweavers commented Mar 2, 2023

sni commented Mar 2, 2023

infraweavers commented Mar 3, 2023

sni commented Mar 3, 2023

infraweavers commented Mar 3, 2023

infraweavers commented Mar 6, 2023

infraweavers commented Mar 9, 2023

sni commented Mar 10, 2023

infraweavers commented Mar 10, 2023

infraweavers commented Mar 14, 2023

infraweavers commented Mar 17, 2023 • edited Loading

infraweavers commented Mar 17, 2023

sni commented Jun 22, 2023

infraweavers commented Mar 17, 2023 •

edited

Loading