Question: best way to cancel an ongoing generation? #227

deet · 2025-03-07T17:33:49Z

I'm looking into how feasible it would be to interrupt an in-progress LLM or VLM generation.

It seems that it's reasonably straightforward to cancel once the output tokens are being generated, since the developer can check for eg Task.isCancelled in their didGenerate block and abort.

However, significant time can also be spent in earlier phases like in the TokenIterator before output is generated and the UserInputProcessor.

So a few questions:

Are there plans to support cancellation of generation tasks?
If I wanted to implement a version of this now, is checking for Task.isCancelled in various places in eg the implementations of UserInputProcessor the best approach? It seems that for vision models this might be between processing of each image, or each video frame.
How feasible would it be to store the state already completed, such as the KVCache when interrupted, so that the task could be resumed at a later point from the same position?

The text was updated successfully, but these errors were encountered:

davidkoski · 2025-03-07T18:21:44Z

Are there plans to support cancellation of generation tasks?

Nothing planned, but this sounds useful.

If I wanted to implement a version of this now, is checking for Task.isCancelled in various places in eg the implementations of UserInputProcessor the best approach? It seems that for vision models this might be between processing of each image, or each video frame.

Yes, that sounds like it would be needed. The UserInputProcessor has to return an LMInput:

public func prepare(input: UserInput) throws -> LMInput

So you would need to figure out what to do if it was cancelled. It could throw an error (maybe a well known taskCancelled error that the caller could deal with) or an empty LMInput. The call to prepare the UserInput -> LMInput is already in the user code so call sites would need to know what to do -- I think this ties in to question 3 below. If the prepare() call is interrupted the caller needs to know that the LMInput isn't valid, which suggests a throw to me, however this might be an unexpected error to some callers. Do we need to let the caller specify the behavior here? Pass in some kind of "step observer block" for policy? I think this requires some playing around to see what fits well.

The other checks on isCancelled could either be in the generate() callback or in generate() itself. This already has mechanisms for stopping early so it seems like either could work well, though it may depend on what is decided for the prepare() call (if it is explicit or not).

The TLDR, I think is one of these two general approaches:

prepare() and generate() implicitly handle Task.isCancelled, but callers of prepare() need to know if the LMInput is valid, so the probably means an Error
prepare() and generate() have to opt-in to cancellation behavior by passing an optional parameter (a bool or a block), same caveat on prepare() wrt Error (but the caller determines if this is possible)

How feasible would it be to store the state already completed, such as the KVCache when interrupted, so that the task could be resumed at a later point from the same position?

See #196 for some discussion. Since the KVCache is a reference type and you can pass it in, you should be able to hold it in the code that calls the iterator without a problem. For a VLM I wonder if this is sufficient? I think it would want to redo the image/video portion of the input -- want to capture the LMInput rather than the UserInput (which is exposed at the generate() level).

Also be aware of this:

https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/MLXLMCommon/Evaluate.swift#L550

we are doing asynchronous evaluation to prepare the next token (keeping the GPU busy in the gaps between tokens). I think that is fine, but something to think about.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: best way to cancel an ongoing generation? #227

Question: best way to cancel an ongoing generation? #227

deet commented Mar 7, 2025

davidkoski commented Mar 7, 2025

Question: best way to cancel an ongoing generation? #227

Question: best way to cancel an ongoing generation? #227

Comments

deet commented Mar 7, 2025

davidkoski commented Mar 7, 2025