Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment of local MultiLoRA model using TGI #2564

Open
ashwincv0112 opened this issue Dec 27, 2024 · 7 comments
Open

Deployment of local MultiLoRA model using TGI #2564

ashwincv0112 opened this issue Dec 27, 2024 · 7 comments

Comments

@ashwincv0112
Copy link

ashwincv0112 commented Dec 27, 2024

Hi Team,

Was trying to deploy a multi-lora adapter model with Starcoder2-3B as base.

Referring to the below blog:
https://huggingface.co/blog/multi-lora-serving

Please correct my understanding if I'm am wrong, that the Starcoder2 model is not supported for the multi-lora deployment using TGI. We are getting the below error while deploying.

AttributeError: 'TensorParallelColumnLinear' object has no attribute 'base_layer' rank=0

Also, can you suggest how we can deploy a local model and adapters saved in the local directory using TGI.
Every time I try running the below docker command, it is downloading the files from HF.

docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD:/data \
    ghcr.io/huggingface/text-generation-inference:3.0.1 \
    --model-id bigcode/starcoder2-3b \
    --lora-adapters=<local_adapter_path>

Please let me know if any additional information is required.

Thanks,
Ashwin.

@muhammad-asn
Copy link

Any update on this?

@ashwincv0112
Copy link
Author

Still facing the issue.

@muhammad-asn
Copy link

Still facing the issue.

I think you should add issue to the https://github.com/huggingface/text-generation-inference repo
@ashwincv0112

@ashwincv0112
Copy link
Author

sure. I will add the issue.
So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?

@muhammad-asn
Copy link

sure. I will add the issue. So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?

Yup I'm run into an issue too when using custom based on https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 model

@ashwincv0112
Copy link
Author

so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?

@muhammad-asn
Copy link

so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?

Yupp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants