Skip to content

gks-anvil/vrsix

Repository files navigation

vrsix: Indexing VRS-Annotated VCFs

Overview

vrsix provides a file-based indexing strategy to support fast lookup of AnVIL-hosted VCFs using IDs and annotations drawn from the GA4GH Variation Representation Specification.

See the vrsix Terra workflow for a readymade Terra implementation.

Usage

From a VCF, ingest a VRS ID and the corresponding VCF-called location (i.e. sufficient inputs for a tabix lookup), and store them in a sqlite database.

vrsix load chr1.vcf

All instances of variations are stored with an associated file URI to support later retrieval. By default, this URI is simply the input VCF's location in the file system, but you may declare a custom URI instead as an optional argument:

vrsix load chr1.vcf gs://my_stuff/chr1.vcf

By default, all records are ingested into a sqlite file located at ~/.local/share/vrsix.db. This can be overridden with either the environment variable VRS_VCF_INDEX, or with an optional flag to the CLI:

vrsix load --db-location=./vrsix.db input.vcf

Development

Ensure that a recent version of the Rust toolchain is available.

Create a virtual environment and install developer dependencies:

python3 -m venv venv
source venv/bin/activate
python3 -m pip install -e '.[dev,tests]'

This installs Python code as editable, but after any changes to Rust code, run maturin develop to rebuild the Rust binary:

maturin develop

Be sure to install pre-commit hooks:

pre-commit install

Check Python style with ruff:

python3 -m ruff format . && python3 -m ruff check --fix .

Use cargo fmt to check Rust style (must be run from within the rust/ subdirectory):

cd rust/
cargo fmt

Run tests from the project root with pytest:

pytest

Some granular tests are written directly into the Rust backend as well:

cd rust/
cargo test