-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions to facilitate VCF comparison using sgkit #95
Conversation
cb56acf
to
4274f54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Great that we have this working for small datasets, but this isn't going to scale as all the genotypes are bought into RAM as your return type is a list of lists.
For this to work on large datasets, I think you'll need to return copies of ds1 and ds2, where they are filtered down to the common sites and ds2 is remapped. Does that make sense for what you are trying to do?
Lets talk Monday in the office about how we can modify this code to achieve that.
Thanks, @benjeffery ! Just for my understanding, the function should return I'd just like to compare |
cc18bcf
to
81d6090
Compare
Should we just do that all within |
Since |
1b7393d
to
621f15b
Compare
I'm thinking that |
3b1545a
to
d576782
Compare
I've modified |
The "compatible" |
What i'd like to do is multi-way VCF comparisons, i.e., imputed genotypes from lshmm ( |
bd4d670
to
752fe39
Compare
For now, let's assume that only ACGT are allowed, to keep things simple. |
Also, I still need to add tests for |
4071537
to
0cc96df
Compare
There are still more tests that could be added, but the current tests are good enough, I think. Will continue adding more tests in a separate issue. |
Addresses #94