Unleashing the Power of Phi-3-mini's 4K Context Window with LoRA Distillation (and a Trick for Budget GPUs!) #781
Replies: 2 comments
-
That is very cool, thanks for sharing! It makes intuitive sense to me that you can trade off in-context learning (e.g. from a long prompt) with fine-tuning. There's a lot of very interesting questions there that I think are not yet explored (or at least I don't know the answers to) around which is more efficient, which works better, and in general in what settings you should prefer one over the other. |
Beta Was this translation helpful? Give feedback.
-
Oh wow, thanks for the kind words, @awni! I'm a big fan of MLX and stoked to see how this can push things forward. I totally agree, there's so much more to explore here. |
Beta Was this translation helpful? Give feedback.
-
Introduction
Phi-3-mini has been a game-changer, packing impressive performance into a tiny model. The 128K context window is especially exciting (although not yet supported by the MLX library), opening up the possibilities for many-shot learning that recent research has shown can be incredibly effective. But let's face it, most of us don't have access to high-end GPUs with massive VRAM to fully leverage that massive context window.
So, here's my take on a potential workaround: distilling the knowledge from those long contexts into LoRA adapters.
Why This Matters
How It Works (The Short Version)
Results (So Far)
I've been experimenting with this on a few different tasks, and the initial results are promising! For example, here's a comparison (admittedly cherry-picked for its impressiveness) between zero-shot, n-shot, and LoRA zero-shot performance on a medical question answering task.
Benefits
Potential Use Case
Picture this: a library of LoRA adapters on your hard drive, each fine-tuned to enhance a specific skillset of your base LLM model. One adapter turns it into an MLX library guru, while another equips it with expert knowledge of ICHD-3 headache classifications. This modular approach enables efficient, granular updates, ensuring your AI's expertise remains current without retraining the entire model. Moreover, with a sLLM like Phi-3-mini, this can be on an everyday cellphone or a Raspberry Pi!
Github
I'm sharing my code and initial findings on Github, and I would love to hear your thoughts, ideas, and feedback!
Beta Was this translation helpful? Give feedback.
All reactions