kbatch jobs deleted but pods still running #254

erthward · 2023-08-09T15:54:36Z

erthward
Aug 9, 2023

Hi, All.

I have a job that I have finished debugging and am ready to scale up and submit through kbatch. However, that of course brought its own set of issues that needed to be debugged. I have worked all of those issues out, as far as I can tell, but now I can no longer get the job to even start running (e.g., I have a job 'running' now for 5 days but yet to even print out the print statement at the top of the script).

I've been racking my brain trying to figure out what's going wrong, and carefully rereading the kbatch docs. In doing that, it occurred to me to run kbatch pod list. To my chagrin, I find that jobs stretching back to two weeks ago still have pods labeled 'Running' (12 total), even though I explicitly ran kbatch job delete for each one and the kbatch docs state that that should 'delete a job, cancelling running pods'.

I'm wondering if all those still-running pods are sucking up resources, preventing me from relaunching my job, and if so, how I can actually cancel them and release the resources for reuse. (I picked through the kbatch docs but did not see anything I could easily figure out how to repurpose to do this myself, though I did see that the docstring for _backend.py states that 'kbatch users do not have access to the Kubernetes API', which gives me the impression that I may not be able to cancel them myself...)

Thanks for any advice you can provide!

erthward · 2023-08-09T19:20:34Z

erthward
Aug 9, 2023
Author

NEVERMIND! The problem causing the job to run indefinitely was entirely my dumb fault (not catching an edge case that caused an infinite while loop).

Nevertheless, still curious if I can/should be closing still-running pods from previous jobs.

Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kbatch jobs deleted but pods still running #254

{{title}}

Replies: 1 comment

{{title}}

Select a reply

kbatch jobs deleted but pods still running #254

erthward Aug 9, 2023

Replies: 1 comment

erthward Aug 9, 2023 Author

erthward
Aug 9, 2023

erthward
Aug 9, 2023
Author