This is a simple implementation of the popular RAG technique with differential privacy guarantees.
DP-RAG addresses privacy concerns in RAG systems by using DP to aggregate information from multiple documents, thereby preventing the inadvertent disclosure of sensitive data. The core innovation involves a novel token-by-token aggregation technique and a DP-based document retrieval method.
The technical report presents empirical results demonstrating DP-RAG's effectiveness, particularly when sufficient documents provide the necessary information. The repo also contains the code to evaluate the system on synthetic medical data.
On a computer with a GPU and CUDA installed, clone thie repository:
git clone git@github.com:sarus-tech/dp-rag.git
Then cd
to this folder, type uv venv
and activate the virtualenv with source .venv/bin/activate
.
You can then install the packages with uv sync
and run the test script: python test_dp_rag.py
.
A report with the technical details and benchmark results is available there: RAG with Differential Privacy.
@misc{grislain2024ragdifferentialprivacy,
title={RAG with Differential Privacy},
author={Nicolas Grislain},
year={2024},
eprint={2412.19291},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.19291},
}