Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determining model reference when deploying Ollama model with docker. #784

Open
1 of 2 tasks
WhyPine opened this issue Jan 16, 2025 · 0 comments
Open
1 of 2 tasks

Comments

@WhyPine
Copy link

WhyPine commented Jan 16, 2025

System Info

CUDA Version 12.6
2x RTX 3060 12GB

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

I want to run the llama stack server through docker. However, when running any non-fp16 variants it does not work.
I think this may have been resolved in this commit
When attempting to use docker run, I get a value error.
What would be the correct INFERENCE_MODEL or docker run command to successfully use a differently quantized model like below?

export INFERENCE_MODEL="meta-llama/Llama-3.2-11B-Vision-Instruct"
export OLLAMA_INFERENCE_MODEL="llama3.2-vision:11b-instruct-q8_0"
export LLAMA_STACK_PORT=5001

ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m

docker run -it
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
-v ~/.llama:/root/.llama
llamastack/distribution-ollama
--port $LLAMA_STACK_PORT
--env INFERENCE_MODEL=$INFERENCE_MODEL
--env OLLAMA_URL=http://host.docker.internal:11434

Error logs

ValueError: Model 'llama3.2-vision:11b-instruct-fp16' is not available in Ollama. Available models:

Expected behavior

The Docker container is successfully run and supports quick start inference examples.

@WhyPine WhyPine changed the title Utilizing Ollama model reference when deploying with docker. Determining model reference when deploying Ollama model with docker. Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant