ACL Search

Use ColBERT as a search engine for the ACL Anthology and OpenReview conferences, or any .bib file. Check out the live demo.

Quick Setup

# (optional): conda install -y -n aclsearch python=3.10
git clone https://github.com/davidheineman/acl-search
pip install -r requirements.txt 
python src/server.py # (this will download a pre-built index!)

Common fixes:

# getting pip errors? (install sentencepiece deps)
sudo apt-get update
sudo apt-get install -y pkg-config libsentencepiece-dev

# running on CUDA? (fix broken package path)
INSTALL_PATH=PATH_TO_YOUR_PYTHON_INSTALL # e.g., /root/ai2/miniconda3/envs/acl_search/lib/python3.10
cp ./src/extras/segmented_maxsim.cpp $INSTALL_PATH/site-packages/colbert/modeling/segmented_maxsim.cpp
cp ./src/extras/decompress_residuals.cpp $INSTALL_PATH/site-packages/colbert/search/decompress_residuals.cpp
cp ./src/extras/filter_pids.cpp $INSTALL_PATH/site-packages/colbert/search/filter_pids.cpp
cp ./src/extras/segmented_lookup.cpp $INSTALL_PATH/site-packages/colbert/search/segmented_lookup.cpp

More Features

(Optional) Parse & Index the Anthology

This step allows indexing the anthology manually. This can be skipped, since the parsed/indexed anthology will be downloaded from huggingface.co/davidheineman/colbert-acl.

You can also include you own papers by adding to the anthology.bib file!

# pull from openreview
echo -e "[email]\n[password]" > .openreview
python src/scrape/openrev.py

# pull from acl anthology
python src/scrape/acl.py

# create unified dataset
python src/parse.py

# index with ColBERT 
# (note: sometimes there is a silent failure if the CPP extensions do not exist)
python src/index.py

Deploy Web Server

# Start an API endpoint
gunicorn -w 1 --threads 100 --worker-class gthread -b 0.0.0.0:8080 src.server:app

# Then visit:
# http://localhost:8080
# or use the API:
# http://localhost:8080/api/search?query=Information retrevial with BERT

Deploy as a Docker App

# Build and run locally
docker build . -t acl-search:main
docker run -p 8080:8080 acl-search:main

# Or pull the hosted container
docker pull ghcr.io/davidheineman/acl-search:main # add for macos: --platform linux/arm64 
docker run -p 8080:8080 ghcr.io/davidheineman/acl-search:main

# Lauch it as a web service!
brew install flyctl
fly launch

fly scale vm shared-cpu-2x # scale up cpu!
fly scale memory 4096 # scale up memory!

Update Index on HF

# Download a fresh set of papers, index and push to hf:
chmod +x src/scrape/beaker/index.sh
./src/scrape/beaker/index.sh

# Build and deploy container for auto-updating:
docker build -t acl-search -f src/scrape/beaker/Dockerfile .
docker run -it -e HF_TOKEN=$HF_TOKEN acl-search # (Optional) test it out!

# Run on beaker
beaker image delete davidh/acl-search
beaker image create --name acl-search acl-search
beaker experiment create src/scrape/beaker/beaker-conf.yml

Paper Table

# add OpenAI API key
echo -e "[OPENAI_API_KEY]" > .openai-api-key

# Run paper table as a local service
pm2 start paper-table.config.js
pm2 logs paper-table-backend --lines 10

pm2 startup # To have it run on startup
pm2 save

# To shut down the server + flush logs
pm2 stop paper-table-backend && pm2 flush paper-table-backend

# To restart
pm2 stop paper-table-backend && pm2 flush paper-table-backend && pm2 start paper-table.config.js

Example notebooks

To see an example of search, visit: colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ACL Search

Quick Setup

More Features

Example notebooks

Files

README.md

Latest commit

History

README.md

File metadata and controls

ACL Search

Quick Setup

More Features

Example notebooks