You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to run the llama stack server through docker. However, when running any non-fp16 variants it does not work.
I think this may have been resolved in this commit
When attempting to use docker run, I get a value error.
What would be the correct INFERENCE_MODEL or docker run command to successfully use a differently quantized model like below?
ValueError: Model 'llama3.2-vision:11b-instruct-fp16' is not available in Ollama. Available models:
Expected behavior
The Docker container is successfully run and supports quick start inference examples.
The text was updated successfully, but these errors were encountered:
WhyPine
changed the title
Utilizing Ollama model reference when deploying with docker.
Determining model reference when deploying Ollama model with docker.
Jan 16, 2025
System Info
CUDA Version 12.6
2x RTX 3060 12GB
Information
🐛 Describe the bug
I want to run the llama stack server through docker. However, when running any non-fp16 variants it does not work.
I think this may have been resolved in this commit
When attempting to use docker run, I get a value error.
What would be the correct INFERENCE_MODEL or docker run command to successfully use a differently quantized model like below?
export INFERENCE_MODEL="meta-llama/Llama-3.2-11B-Vision-Instruct"
export OLLAMA_INFERENCE_MODEL="llama3.2-vision:11b-instruct-q8_0"
export LLAMA_STACK_PORT=5001
ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m
docker run -it
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
-v ~/.llama:/root/.llama
llamastack/distribution-ollama
--port $LLAMA_STACK_PORT
--env INFERENCE_MODEL=$INFERENCE_MODEL
--env OLLAMA_URL=http://host.docker.internal:11434
Error logs
ValueError: Model 'llama3.2-vision:11b-instruct-fp16' is not available in Ollama. Available models:
Expected behavior
The Docker container is successfully run and supports quick start inference examples.
The text was updated successfully, but these errors were encountered: