-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many restarts in K8s cluster deployment #179
Comments
I think it happens when QL becomes unresponsive, and so it's killed by k8s:
|
i believe this was solved with allowing for yellow state of crate cluster. |
@c0c0n3 We are facing this issue in our Kubernetes deployment of quantumleap with wq configuration. We have used below liveness probe setting in both deployment files for quantumleap and quantumleap-wq:
Please find below observation:
Liveness probe failed for quantumleap-wq: We have checked our environment for crate health, and it is Please confirm our understanding: |
@c0c0n3 we have following observation on why livenessProbe is not working with quantumleap-wq deployment file: As two deployments of quantumleap is running in our environment, one is for master and other is for worker. In master deployment file we can have livenessProbe which will call quantumleap's health API and restart pod if any error occurs. Whereas worker quantumleap can only handle notify API as mentioned in https://github.com/orchestracities/ngsi-timeseries-api/blob/master/docs/manuals/admin/wq.md. As per our understanding, health API of quantumleap cannot be executed on worker quantumleap and it returned connection error and restarts. Please correct my understanding if there is anything I am missing. |
hi @pooja1pathak :-)
Correct. Each Worker process is a standalone RQ instance, there's no QL Web API there:
Yes. I suggest you start Workers using Supervisor with our config: That will give you the reliability you're after I guess. More about it here: Hope this helps! |
We've been experiencing an unusually high number of restarts in our K8s cluster. For example in the last 3 days K8s restarted QL 103 and 99 times in each of the two pods, respectively.
The text was updated successfully, but these errors were encountered: