Skip to content

sarus-tech/dp-rag

Repository files navigation

Twitter Follow arXiv

What is Sarus DP-RAG?

This is a simple implementation of the popular RAG technique with differential privacy guarantees.

DP-RAG addresses privacy concerns in RAG systems by using DP to aggregate information from multiple documents, thereby preventing the inadvertent disclosure of sensitive data. The core innovation involves a novel token-by-token aggregation technique and a DP-based document retrieval method.

The technical report presents empirical results demonstrating DP-RAG's effectiveness, particularly when sufficient documents provide the necessary information. The repo also contains the code to evaluate the system on synthetic medical data.

Quick Start

On a computer with a GPU and CUDA installed, clone thie repository:

git clone git@github.com:sarus-tech/dp-rag.git

Then cd to this folder, type uv venv and activate the virtualenv with source .venv/bin/activate.

You can then install the packages with uv sync and run the test script: python test_dp_rag.py.

Technical Report

A report with the technical details and benchmark results is available there: RAG with Differential Privacy.

@misc{grislain2024ragdifferentialprivacy,
      title={RAG with Differential Privacy}, 
      author={Nicolas Grislain},
      year={2024},
      eprint={2412.19291},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2412.19291}, 
}

About

A simple implementation of DP-RAG

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published