Deployment of local MultiLoRA model using TGI #2564

ashwincv0112 · 2024-12-27T11:34:37Z

Hi Team,

Was trying to deploy a multi-lora adapter model with Starcoder2-3B as base.

Referring to the below blog:
https://huggingface.co/blog/multi-lora-serving

Please correct my understanding if I'm am wrong, that the Starcoder2 model is not supported for the multi-lora deployment using TGI. We are getting the below error while deploying.

AttributeError: 'TensorParallelColumnLinear' object has no attribute 'base_layer' rank=0

Also, can you suggest how we can deploy a local model and adapters saved in the local directory using TGI.
Every time I try running the below docker command, it is downloading the files from HF.

docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD:/data \
    ghcr.io/huggingface/text-generation-inference:3.0.1 \
    --model-id bigcode/starcoder2-3b \
    --lora-adapters=<local_adapter_path>

Please let me know if any additional information is required.

Thanks,
Ashwin.

The text was updated successfully, but these errors were encountered:

muhammad-asn · 2025-01-02T14:49:25Z

Any update on this?

ashwincv0112 · 2025-01-02T14:57:50Z

Still facing the issue.

muhammad-asn · 2025-01-02T15:00:20Z

Still facing the issue.

I think you should add issue to the https://github.com/huggingface/text-generation-inference repo
@ashwincv0112

ashwincv0112 · 2025-01-02T15:04:17Z

sure. I will add the issue.
So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?

muhammad-asn · 2025-01-02T15:06:45Z

sure. I will add the issue. So just to confirm my understanding, currently we don't have the capability of deploying the multi-lora logic when the adapters are saved in the local machine?

Yup I'm run into an issue too when using custom based on https://huggingface.co/Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8 model

ashwincv0112 · 2025-01-02T15:10:27Z

so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?

muhammad-asn · 2025-01-02T15:11:07Z

so right now the only option is to upload the adapters to HuggingFace repo and use the respective model-id to deploy the model... right?

Yupp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment of local MultiLoRA model using TGI #2564

Deployment of local MultiLoRA model using TGI #2564

ashwincv0112 commented Dec 27, 2024 •

edited

Loading

muhammad-asn commented Jan 2, 2025

ashwincv0112 commented Jan 2, 2025

muhammad-asn commented Jan 2, 2025

ashwincv0112 commented Jan 2, 2025

muhammad-asn commented Jan 2, 2025

ashwincv0112 commented Jan 2, 2025

muhammad-asn commented Jan 2, 2025

Deployment of local MultiLoRA model using TGI #2564

Deployment of local MultiLoRA model using TGI #2564

Comments

ashwincv0112 commented Dec 27, 2024 • edited Loading

muhammad-asn commented Jan 2, 2025

ashwincv0112 commented Jan 2, 2025

muhammad-asn commented Jan 2, 2025

ashwincv0112 commented Jan 2, 2025

muhammad-asn commented Jan 2, 2025

ashwincv0112 commented Jan 2, 2025

muhammad-asn commented Jan 2, 2025

ashwincv0112 commented Dec 27, 2024 •

edited

Loading