Determining model reference when deploying Ollama model with docker. #784

WhyPine · 2025-01-16T02:53:21Z

System Info

CUDA Version 12.6
2x RTX 3060 12GB

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

I want to run the llama stack server through docker. However, when running any non-fp16 variants it does not work.
I think this may have been resolved in this commit
When attempting to use docker run, I get a value error.
What would be the correct INFERENCE_MODEL or docker run command to successfully use a differently quantized model like below?

export INFERENCE_MODEL="meta-llama/Llama-3.2-11B-Vision-Instruct"
export OLLAMA_INFERENCE_MODEL="llama3.2-vision:11b-instruct-q8_0"
export LLAMA_STACK_PORT=5001

ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m

docker run -it
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
-v ~/.llama:/root/.llama
llamastack/distribution-ollama
--port $LLAMA_STACK_PORT
--env INFERENCE_MODEL=$INFERENCE_MODEL
--env OLLAMA_URL=http://host.docker.internal:11434

Error logs

ValueError: Model 'llama3.2-vision:11b-instruct-fp16' is not available in Ollama. Available models:

Expected behavior

The Docker container is successfully run and supports quick start inference examples.

WhyPine changed the title ~~Utilizing Ollama model reference when deploying with docker.~~ Determining model reference when deploying Ollama model with docker. Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determining model reference when deploying Ollama model with docker. #784

Determining model reference when deploying Ollama model with docker. #784

WhyPine commented Jan 16, 2025

Determining model reference when deploying Ollama model with docker. #784

Determining model reference when deploying Ollama model with docker. #784

Comments

WhyPine commented Jan 16, 2025

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior