-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: best way to cancel an ongoing generation? #227
Comments
Nothing planned, but this sounds useful.
Yes, that sounds like it would be needed. The
So you would need to figure out what to do if it was cancelled. It could throw an error (maybe a well known The other checks on The TLDR, I think is one of these two general approaches:
See #196 for some discussion. Since the KVCache is a reference type and you can pass it in, you should be able to hold it in the code that calls the iterator without a problem. For a VLM I wonder if this is sufficient? I think it would want to redo the image/video portion of the input -- want to capture the Also be aware of this:
we are doing asynchronous evaluation to prepare the next token (keeping the GPU busy in the gaps between tokens). I think that is fine, but something to think about. |
I'm looking into how feasible it would be to interrupt an in-progress LLM or VLM generation.
It seems that it's reasonably straightforward to cancel once the output tokens are being generated, since the developer can check for eg
Task.isCancelled
in theirdidGenerate
block and abort.However, significant time can also be spent in earlier phases like in the
TokenIterator
before output is generated and theUserInputProcessor
.So a few questions:
Task.isCancelled
in various places in eg the implementations ofUserInputProcessor
the best approach? It seems that for vision models this might be between processing of each image, or each video frame.The text was updated successfully, but these errors were encountered: