-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OMD 5.10 shows very different gearman worker status #157
Comments
try disabling embedded perl in the etc/mod-gearman/worker.cfg. I noticed an issue yesterday in the epn connector if the plugin output exceeds 8kb. |
Cool, we'll give that a shot on an un-touched 5.10 |
yeah, but wait till tomorrow, still working on that fix. |
todays daily looks fine. epn should run much smoother now. |
Cool, I'll build one of our boxes onto that and give it a test |
could you try the latest OMD daily, it should work quite well now. I also added something in the gearman neb module to flatten out the number of concurrent started checks. |
Yep we'll do that on Monday |
We've just rolled out |
So from what we can see, it seems to be improved but not really back to where it was in 4.60. I think we will have to increase the workers to see if that will remove some of the noise and pressure that we're seeing. We do also keep getting pnp4nagios errors with the interval being too short between updates (similiar to #156 but for other checks, we have decreased the pnp_gearman_worker down to 1 to eliminate a race condition there and it still does it, so we're thinking that something is running the same check back-to-back as it were). This sort of feels to us that check's aren't being run at regular intervals under 5+ (most of our checks are once per minute). We're going to investigate if we have evidence to support that assertion, but it certainly feels like that's what's going on. |
SO we looked into the naemon suspicions there and have found absolutely no evidence to support the idea that checks are being run more frequently than they should be. So, we have bumped our thresholds up from 500 to 2500 for the time being whilst we try and ascertain if the change is actually a problem for gearman/OMD etc or not |
btw, load average might seem to increase if you use the check_load scaled by cpu mode. The check_load now has a scaled_load perf counter and the previous "scaled" metric is the absolute unit now. So it might be, that the cpu usage did not increase at all, but the check_load check |
Hello,
So since our OMD 4.40 -> OMD 5.10 upgrade we've been experiencing occasions where our gearman server appears to have large numbers of running or waiting checks. On investigation we can see that the behaviour of service checks through gearman is very different under OMD 5.10. In order to do some diagnostics we've downgraded one of our OMD boxes to OMD 4.60; but we have "transplanted" the version of mod_gearman_worker-go and the epn into the 4.60 box so we're not running into ConSol-Monitoring/mod-gearman-worker-go#19 This has the added benefit of exonerating mod_gearman_worker-go which is nice. I'm leaning towards there being a change in naemon-core.
OMD config:
Graph of
/omd/sites/default/lib/monitoring-plugins/check_gearman -H OMD101.man.cwserverfarm.local -W 501 -C 750 -w 501 -c 750
where we can see the differing behaviour.The text was updated successfully, but these errors were encountered: