Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megamash file #53

Closed
Koeng101 opened this issue Jan 10, 2024 · 1 comment
Closed

Megamash file #53

Koeng101 opened this issue Jan 10, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Koeng101
Copy link
Owner

Koeng101 commented Jan 10, 2024

It is useful for me to keep records of megamash matches. I think this should be a file format.

@VN 0.0.1
@KmerSize 16
@MinimialKmerMatches 10
@Threshold 0.2
@Separator |
### START SUBHEADER ###
identifier    sequence
identifier2   sequence
### END SUBHEADER ###
289a197e-4c05-4143-80e6-488e23044378    2    identifier|identifier2    78/150|51/53

The subheader basically has fasta_identifier, sequence as headers, with the actual generated section having query_name(fastq read name), number of matches, matches separated by the separator and then coverage. Cover is actually expressed as int/int, due to the number of kmer matches being relevant information (high number on both means high confidence, while low total kmer means a lower confidence match).

Can also have a complementing JSON implementation, for easily reading after generation. The reason I like having a kind of TSV format is the most common use of the matches will be streaming to other systems.

@Koeng101 Koeng101 added the enhancement New feature or request label Jan 10, 2024
@Koeng101
Copy link
Owner Author

Closing, because of a megamash rewrite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant