fastqindex #57

Koeng101 · 2024-02-12T22:48:41Z

I want a binary fastqindex similar to https://hasindu2008.github.io/slow5specs/slow5-v1.0.0.pdf

This would mainly be used when writing a large fastq file to a data store, like S3, while still wanting to seek out specific lines from that fastq file. There would be two modifications: standardization of size,

- (2 byte) uint16: length of read ID 
- (var byte) read ID (UUIDs can be used directly or a hash of the identifier can be used). Often 16 byte for UUID
- (8 byte) uint64: start position
- (4 byte) uint32: length

30 bytes in total for a typical run. If a promethion flow cell returns 10,000,000 reads, the index file will be approx 286mb.

The text was updated successfully, but these errors were encountered:

Koeng101 · 2024-02-13T08:43:52Z

Hmm, I think static allocation of bytes might be interesting here.

- (16 byte) read ID (UUIDs can be used directly or a hash of the identifier can be used)
- (8 byte) uint64: start position
- (4 byte) uint32: length

This would allow you to statically allocate the whole index into memory - you can derive the exact number of reads from the byte length of the file, and you can statically allocate a whole bunch of things

Koeng101 added the enhancement New feature or request label Feb 12, 2024

Koeng101 mentioned this issue Feb 13, 2024

indexing #60

Merged

Koeng101 closed this as completed Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastqindex #57

fastqindex #57

Koeng101 commented Feb 12, 2024

Koeng101 commented Feb 13, 2024

fastqindex #57

fastqindex #57

Comments

Koeng101 commented Feb 12, 2024

Koeng101 commented Feb 13, 2024