Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dencode-partition fails with AssertionError #144

Closed
benjeffery opened this issue Apr 25, 2024 · 32 comments
Closed

dencode-partition fails with AssertionError #144

benjeffery opened this issue Apr 25, 2024 · 32 comments
Labels
bug Something isn't working

Comments

@benjeffery
Copy link
Contributor

Screenshot from 2024-04-25 12-14-47

@benjeffery
Copy link
Contributor Author

This looks like empty partitions, similar to the issue we had on explode?

@benjeffery
Copy link
Contributor Author

I can confirm that in IntermediateColumnarFormatField.chunks
chunk_num_records[start_chunk:] is [] and chunk_cumulative_records[start_chunk + 1 :] is also []

@jeromekelleher
Copy link
Contributor

Can you run that again with -vv please to get debug output?

Can you have a look at the ICF metadata file and see if there any partition with "num_records": 0"?

@benjeffery
Copy link
Contributor Author

Screenshot from 2024-04-25 16-01-01

There are no zeros in the ICF metadata.

@jeromekelleher
Copy link
Contributor

What version of bio2zarr have you here? Line 605 is nowhere near iter_values in main

@jeromekelleher
Copy link
Contributor

Can you view the icf metadata please, and show the last few partitions. Must be something odd to do with tabix indexing

@benjeffery
Copy link
Contributor Author

What version of bio2zarr have you here? Line 605 is nowhere near iter_values in main

This was on e6b9a after #136 was merged.

@benjeffery
Copy link
Contributor Author

Can you view the icf metadata please, and show the last few partitions. Must be something odd to do with tabix indexing

Screenshot from 2024-04-26 10-01-14

@jeromekelleher
Copy link
Contributor

This was on e6b9a after #136 was merged.

I don't think it can be - this is line 605 on that commit:

# A map of partition id to the cumulative number of records

@jeromekelleher
Copy link
Contributor

Also it looks like you're pulling the bio2zarr code from a local directory - it would be better to install to the environment. There's regular releases for just this.

@jeromekelleher
Copy link
Contributor

Very puzzling... Can you give me the output of these commands please:

$ls [ICF_PATH]/REF/p14761

and

python3 -c 'import pickle; print(pickle.load(open("[ICF_PATH]/REF/p14761/chunk_index", "rb")))'

(Regretting not making the chunk indexes a simple JSON file now!)

@benjeffery
Copy link
Contributor Author

This was on e6b9a after #136 was merged.

I don't think it can be - this is line 605 on that commit:

# A map of partition id to the cumulative number of records

It's 685, not 605. Sorry I can't cut and paste things!

@benjeffery
Copy link
Contributor Author

'import pickle; print(pickle.load(open("[ICF_PATH]/REF/p14761/chunk_index", "rb")))'

Screenshot from 2024-04-26 11-47-08

@jeromekelleher
Copy link
Contributor

It's 685, not 605. Sorry I can't cut and paste things!

Ahhhh - sorry, I should have zoomed in! 🤦

@jeromekelleher
Copy link
Contributor

OK, looks like it's a problem with the indexing code. We should have start_chunk=0 (there's only one chunk) but for some reason it's coming out as 1. Would you mind dropping into the debugger please and seeing if you can track this down @benjeffery?

@jeromekelleher
Copy link
Contributor

I'm going to see if I can replicate on the chr2 1000G data - we'll see how long it all takes when there's only 40 processes.

@jeromekelleher
Copy link
Contributor

I can't reproduce this on 1000G - seems to work fine with maximal partitioning. There's quite a lot fewer variants though.

Is it just this particular partition that the error occurs on or are there others?

@benjeffery
Copy link
Contributor Author

There were a few hundred that failed with this error - digging into this today.

@benjeffery
Copy link
Contributor Author

benjeffery commented Apr 30, 2024

Trying to understand what is going on here:
In iter_values We end up with a chunk_offset of 29838. This is greater than the largest value (23135) in chunk_record_index so we end up with a start_chunk that is greater than the number of chunks, so chunks then returns an empty list.

@benjeffery
Copy link
Contributor Author

benjeffery commented Apr 30, 2024

I've also checked and it is 550 failing partitions spread non-contiguously from 1708-5985.

@benjeffery
Copy link
Contributor Author

Ah-ha, wonder if this has something to do with it. Running dexplode-init again and check_overlap finds an overlap. That check was added after I exploded these VCFs.

@jeromekelleher
Copy link
Contributor

Ahhhh

@jeromekelleher
Copy link
Contributor

We shouldn't have overlaps here though, right? Can you give some details?

@benjeffery
Copy link
Contributor Author

The list of filename, partition.region.start and partition.reigion.end looks like this:
Screenshot from 2024-04-30 14-59-05

Really odd that a VCF in the middle contains 1. Checking the VCF to be sure.

@jeromekelleher
Copy link
Contributor

A bunch of weird stuff here... are you sure the files are all part of the same set?

@benjeffery
Copy link
Contributor Author

They are all in the same folder and have the same naming convention. The 58219159 file contains variants that start at that position.

@benjeffery
Copy link
Contributor Author

benjeffery commented May 1, 2024

So... that one VCF file, the CSIIndex has the first indexed position as 1. No other VCFs in the set have this.
Cheking this is in fact the case in the CSI and not an issue in the index loader.

@benjeffery
Copy link
Contributor Author

benjeffery commented May 1, 2024

So this is the only index file that has a bin (bin 9) which is the first bin on it's level (level 2). Tracing this all the way back to the bytes in the CSI file confirms this.
I'm re-indexing the file to see if this is the expected behavior.

@jeromekelleher
Copy link
Contributor

What's the position of the first variant in that file?

@benjeffery
Copy link
Contributor Author

58720256

@jeromekelleher
Copy link
Contributor

OK, I'm rejigging a bunch of things to avoid using the record counts from the indexes. They're really not reliable, and not present in a lot of cases.

@jeromekelleher
Copy link
Contributor

Hopefully close in #164. I'm going to push out a release in a few minutes so we can test @benjeffery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants