Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastqindex #57

Closed
Koeng101 opened this issue Feb 12, 2024 · 1 comment
Closed

fastqindex #57

Koeng101 opened this issue Feb 12, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Koeng101
Copy link
Owner

I want a binary fastqindex similar to https://hasindu2008.github.io/slow5specs/slow5-v1.0.0.pdf

This would mainly be used when writing a large fastq file to a data store, like S3, while still wanting to seek out specific lines from that fastq file. There would be two modifications: standardization of size,

- (2 byte) uint16: length of read ID 
- (var byte) read ID (UUIDs can be used directly or a hash of the identifier can be used). Often 16 byte for UUID
- (8 byte) uint64: start position
- (4 byte) uint32: length

30 bytes in total for a typical run. If a promethion flow cell returns 10,000,000 reads, the index file will be approx 286mb.

@Koeng101 Koeng101 added the enhancement New feature or request label Feb 12, 2024
@Koeng101
Copy link
Owner Author

Hmm, I think static allocation of bytes might be interesting here.

- (16 byte) read ID (UUIDs can be used directly or a hash of the identifier can be used)
- (8 byte) uint64: start position
- (4 byte) uint32: length

This would allow you to statically allocate the whole index into memory - you can derive the exact number of reads from the byte length of the file, and you can statically allocate a whole bunch of things

@Koeng101 Koeng101 mentioned this issue Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant