-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
packaging dep and IcfMetadata.__eq__ #322
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks @ACEnglish!
I think the solution is slightly over cooking it, it it's easy enough to just compare on the fields we care about here. We'd like to add your test case to the vcf examples. Can you:
- Create a new directory, tests/data/vcf/out_of_order_fields and put your test1.vcf and test2.vcf in there (in bgzip format). Also add the csi or tabix indexes, whichever suits best. Add the directory to git.
- Add a new test class to
tests/test_vcf_examples.py
TestOutOfOrderFields``, and add a few simple tests to it (following, say, this one:bio2zarr/tests/test_vcf_examples.py
Line 453 in d851fc3
class TestSmallExampleLocalAlleles: - Update the
TestDu
class here to account for the new files (sorry, this is a stupid test, I really need to think of a better way to do this!)
bio2zarr/vcf2zarr/icf.py
Outdated
def __eq__(self, other): | ||
if not isinstance(other, IcfMetadata): | ||
return NotImplemented | ||
return (sorted(self.samples) == sorted(other.samples) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting the samples will lead to corruption I think, as we assume that the samples are all in the same order currently (that could be fixed, easily enough, though). How about we just add the sorted on the self.fields
attribute, and keep all the others as plain ==
?
bio2zarr/vcf2zarr/icf.py
Outdated
class Filter: | ||
id: str | ||
description: str = "" | ||
|
||
|
||
@dataclasses.dataclass | ||
@dataclasses.dataclass(order=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need order=True
here, do we? We're just defining the eq method, and don't need to sort the header definitions.
Hello, Thanks for the feedback. I've added tests and made some corrections. I was unsure about if Filters/Format fields needed sorting, so I made them very out of order in the input.bcf files and found that indeed they didn't need I also figured out this pre-commit which is new to me so hopefully there won't be any CI/CD fails. Let me know if there's more to be done. I'm happy to help. Have a great day, |
Looks great, thanks @ACEnglish ! Good to merge. It would be nice to squash the commits down to two, if you don't mind (but no big deal if you don't want to/can't be bothered etc) |
adding packing dependency and allowing header INFO fields to be in different orders between multiple VCFs
Commits squished. I'm not sure if the |
Thanks @ACEnglish, this is great. I'll merge it next week and follow up with a release (this week is deadline heavy, I'm afraid!) |
Change 1
After creating a clean
python3 -m venv b2z
andpython3 -m pip install .
in bio2zarr, runningvcf2zarr
raised theerror:
After manually installing
python3 -m pip install packaging
, the error was no longer raised. Therefore,adding
packaging
to the requirements inpyproject.toml
, making a new venv, reinstalling, and successfullyran
vcf2zarr
.Change 2
When exploding VCFs with different Header field order, the error was raised
To fix this, add order to relevant dataclasses, add
__eq__
onIcfMetadata
and callsorted
for field comparisonExample data:
test1.vcf
test2.vcf