Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a flag to exit immediately on error instead of retrying #297

Closed
wants to merge 7 commits into from
Closed

Add a flag to exit immediately on error instead of retrying #297

wants to merge 7 commits into from

Conversation

xqm32
Copy link

@xqm32 xqm32 commented Apr 2, 2024

This PR adds a flag to prevent restarting the docker event subprocess.

I'm working on Docker in Docker within Kubernetes, leveraging docuum for Docker image management. While docuum effectively cleans up unused images, I encountered an issue with stopping docuum containers in a Kubernetes environment. Currently, when attempting to shut down docuum, it doesn't respond to the SIGTERM signal, necessitating a wait for the Grace Termination Duration before receiving a SIGKILL.

I've explored implementing SIGTERM handling for docuum, but found it to be complex and inelegant. Instead, I propose introducing an --fail-fast option. With this option, if the Docker engine goes down, docuum will shut down as well. This enhancement is particularly valuable in Kubernetes setups, where Kubernetes will restart the docuum container if the Docker engine isn't ready. Additionally, this ensures docuum can correctly terminate when shutting down the entire pod.

Status: Ready

Fixes (Partly): #234

Copy link
Owner

@stepchowfun stepchowfun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xqm32, thank you for submitting this pull request!

I'm supportive of this feature in general, although I really do wish we could handle SIGTERM. I think I looked into it a while ago and came to the same conclusion as you (but I don't remember why exactly).

I think we need to change the name of the flag to be more specific. --fail-fast makes me think that Docuum will terminate if it fails to delete an image for whatever reason. Maybe --fail-if-docker-events-fails? (Ugh.)

I am also concerned about the startup behavior: I think most users enjoy not having to worry about starting Docuum and Docker in any particular order, since Docuum will happily just sit around until Docker is read. But with this flag enabled, that is no longer true. I'm not sure what the best solution is though.

@@ -38,12 +38,15 @@ Here are the supported command-line options:

```
USAGE:
docuum
docuum [OPTIONS]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the output of docuum --help change here with the introduction of the new flag? (Is this clap's behavior?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clap says1:

The default is that the auto-generated help message will group flags, and options separately.

Maybe the flag --fast-fail makes clap reorganize the help information.

Footnotes

  1. https://docs.rs/clap/2.34.0/clap/enum.AppSettings.html#variant.UnifiedHelpMessage

@xqm32
Copy link
Author

xqm32 commented Apr 3, 2024

I think we need to change the name of the flag to be more specific. --fail-fast makes me think that Docuum will terminate if it fails to delete an image for whatever reason. Maybe --fail-if-docker-events-fails? (Ugh.)

How about --fail-on-docker-exit?

I am also concerned about the startup behavior: I think most users enjoy not having to worry about starting Docuum and Docker in any particular order, since Docuum will happily just sit around until Docker is read. But with this flag enabled, that is no longer true. I'm not sure what the best solution is though.

If users are running docuum directly on their machine or using docker, they actually do not need this flag, because they can manually send SIGKILL to docuum at any time, and they can just not use this flag and run docuum as before. But Kubernetes users need this flag to let the docuum life cycle be taken over by Kubernetes. They don't have to worry about the startup sequence, because Kubernetes will automatically restart when the docuum container is stopped, which is the default restartPolicy of Kubernetes1.

Footnotes

  1. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

@xqm32 xqm32 changed the title Add an option to exit immediately on error instead of retrying Add a flag to exit immediately on error instead of retrying Apr 3, 2024
@xqm32 xqm32 requested a review from stepchowfun April 4, 2024 15:54
@stepchowfun
Copy link
Owner

Hi @xqm32, I took another look at handling signals properly, and I found a fairly simple way to do it in #299. That should solve your issue, right?

@xqm32
Copy link
Author

xqm32 commented Apr 5, 2024

Hi @xqm32, I took another look at handling signals properly, and I found a fairly simple way to do it in #299. That should solve your issue, right?

Cooooooooooool! PR #299 works fine for my scenario. Thanks for your work!

@xqm32 xqm32 closed this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants