You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's not clear to me from looking at the code if this library supports the following pattern:
mlx_lm.cache_prompt --prompt 'Here are 100 examples of how to produce a desired output: {examples}'
... cachedprompt.safetensors saved to cwd
prompt_template = "Now produce an output from this sentence: {sentence}"
prompts_raw = [prompt_template.format(sentence=sentence) for sentence in sentences]
response = batch_generate(kv-cache-file=cachedprompt.safetensors, prompts=prompts_raw)
... batched generations created
Is this something the library can or could do? I'm interested in being able to provide multi-shot examples without introducing huge prompt processing times due to wasted compute on re-encoding the same pre-prompt
The text was updated successfully, but these errors were encountered:
It's not clear to me from looking at the code if this library supports the following pattern:
mlx_lm.cache_prompt --prompt 'Here are 100 examples of how to produce a desired output: {examples}'
... cachedprompt.safetensors saved to cwd
... batched generations created
Is this something the library can or could do? I'm interested in being able to provide multi-shot examples without introducing huge prompt processing times due to wasted compute on re-encoding the same pre-prompt
The text was updated successfully, but these errors were encountered: