Welcome to Open-Retrievals, a cutting-edge repository designed to empower your retrieval-augmented generation (RAG) pipelines with state-of-the-art techniques in embedding, reranking, and RAG integration.
Model | Original | Finetuned |
---|---|---|
m3e | 0.654 | 0.693 |
bge-base-zh-v1.5 | 0.657 | 0.703 |
Qwen2-1.5B-Instruct | - | 0.695 |
e5-mistral-7b-instruct | 0.651 | 0.699 |
Data Format
- Text pair: use in-batch negative fine-tuning
{'query': TEXT_TYPE, 'positive': List[TEXT_TYPE]}
...
- Text triplet: Hard negative (or mix In-batch negative) fine-tuning
{'query': TEXT_TYPE, 'positive': List[TEXT_TYPE], 'negative': List[TEXT_TYPE]}
...
- Text scored pair:
{(query, positive, label), (query, negative, label), ...}
📊 2. Reranking
Model | Original | Finetuned |
---|---|---|
bge-reranker-base | 0.666 | 0.706 |
bge-m3 | 0.657 | 0.695 |
Qwen2-1.5B-Instruct | - | 0.699 |
bge-reranker-v2-gemma | 0.637 | 0.706 |
chinese-roberta-wwm-ext (ColBERT) | - | 0.687 |
📚 3. RAG
For basic rag application, refer to rag_langchain_demo.py
speed: Nvidia TensorRT + Nvidia Triton inference server
> Microsoft ONNX Runtime + Nvidia Triton inference server
> Pytorch + FastAPI
Prerequisites
pip install optimum
pip install onnxruntime
python embed2onnx.py --model_name BAAI/bge-small-en-v1.5 --output_path ./onnx_model
- The grad_norm during training is always zero?
- consider to change fp16 or bf16
- while training, set
bf16
orfp16
inTrainingArguments
; while inference, setuse_fp16=True
inAutoModelForEmbedding
orLLMRanker
- The fine-tuned embedding performance during inference is worse than original?
- check whether the pooling_method is correct
- check whether the prompt or instruction is exactly same as training for LLM model
- How can we fine-tune the
BAAI/bge-m3
ColBERT model?
- open-retrievals support to fine-tune the
BAAI/bge-m3 colbert
directly, just don't setuse_fp16=True
while fine-tuning, and set the learning_rate smaller
- The performance is worse?
- the collator and loss should be aligned, especially for triplet training with negative embeddings. The collator of open-retrievals provided is
{query: value, positive: value, negative: value}
. Another collator is{query: value, document: positive+negative}
, so the loss function should be treated accordingly