Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-pass non-Dask VCF conversion #1185

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
d644071
Initial copy of prototype
jeromekelleher Jan 23, 2024
7641481
Start pulling in existing infra
jeromekelleher Jan 23, 2024
0ee7356
Add some required fields
jeromekelleher Jan 23, 2024
3a530c7
More-or-less full minimal VCF parsing
jeromekelleher Jan 23, 2024
c4432d0
Fix buffering bug
jeromekelleher Jan 24, 2024
a140f72
Make "vcf_converter" file
jeromekelleher Jan 24, 2024
1cded23
Simple test passes
jeromekelleher Jan 24, 2024
cad2eb8
Getting same results as existing converter
jeromekelleher Jan 24, 2024
5c7aaad
Some tests
jeromekelleher Jan 24, 2024
c7c5ab9
Experimentation with extra fields
jeromekelleher Jan 24, 2024
a63b74d
Support for fixed-size INFO fields
jeromekelleher Jan 25, 2024
a845da4
Add fixed genotype fields
jeromekelleher Jan 25, 2024
b61150c
Add tests for FORMAT fields
jeromekelleher Jan 26, 2024
8d9fe76
Do some groundwork for input specification
jeromekelleher Jan 26, 2024
bab86d0
Flush alleles out as pickles
jeromekelleher Jan 26, 2024
7d0e398
Seems to be working
jeromekelleher Jan 26, 2024
36c6a72
Cleanups
jeromekelleher Jan 27, 2024
6c0d35f
Move to explicit two-pass form
jeromekelleher Jan 29, 2024
35d0b13
Different direction
jeromekelleher Jan 29, 2024
44488f1
Much better implementation of "explode"
jeromekelleher Jan 30, 2024
96b122f
Add worker thread option
jeromekelleher Jan 30, 2024
cee4ffd
Add column_chunk_size option
jeromekelleher Jan 30, 2024
425ad14
Tidy up explode format
jeromekelleher Jan 30, 2024
ba26792
Basic column reading
jeromekelleher Jan 30, 2024
ea29986
Various experimentation with bounds and schema generation
jeromekelleher Jan 31, 2024
c91c4c2
working summaries computed and stored
jeromekelleher Jan 31, 2024
2c74001
Fix bug in bounds
jeromekelleher Feb 1, 2024
8a97bd6
Support for summary
jeromekelleher Feb 1, 2024
0f17a29
Tidy up
jeromekelleher Feb 1, 2024
9de0913
Stuff
jeromekelleher Feb 1, 2024
45b26df
Some initial bencmarking on final encoding perf - looks good!
jeromekelleher Feb 1, 2024
028ee11
Column sanitisation seems to be mostly working
jeromekelleher Feb 2, 2024
1bf2d80
Cleanup + working progress bar
jeromekelleher Feb 2, 2024
362fd7d
Fix string problem
jeromekelleher Feb 5, 2024
9485893
Make compressor configurable
jeromekelleher Feb 5, 2024
b523498
Add genotypes
jeromekelleher Feb 5, 2024
766106f
Add top-level function
jeromekelleher Feb 6, 2024
9f4d26e
Almost passing tests
jeromekelleher Feb 6, 2024
f45089e
Basic prototype for plink
jeromekelleher Feb 7, 2024
e89e3fd
Fix various bugs
jeromekelleher Feb 7, 2024
2638cc3
Fix bug in string column handling
jeromekelleher Feb 7, 2024
2f0be81
start on validation tool
jeromekelleher Feb 12, 2024
d42a167
Tweak
jeromekelleher Feb 12, 2024
284ee6e
Basically working validation.
jeromekelleher Feb 14, 2024
97bb893
fix for floats
jeromekelleher Feb 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sgkit/accelerate.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

from numba import guvectorize, jit

_DISABLE_CACHE = os.environ.get("SGKIT_DISABLE_NUMBA_CACHE", "0")
_DISABLE_CACHE = os.environ.get("SGKIT_DISABLE_NUMBA_CACHE", "1")

try:
CACHE_NUMBA = {"0": True, "1": False}[_DISABLE_CACHE]
Expand Down
Loading
Loading