-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anchor barcodes during demultiplexing #58
Conversation
Hello @vaamb, Thanks for the contribution. I want to make I understand the use case here. Does the following represent the type of situation you're trying to cover?
|
Hey @vaamb - we'll need to bump this to next release if we don't hear back from you by EOD tomorrow. Thanks! |
Hello @colinvwood, This is indeed the use case I have. For the record, barcodes were anchored by default in |
- Allow to anchor all the barcodes with a single option without having to add '^' to each barcode in the metadata file - It is possible to individually anchor barcodes for forward and reverse barcodes
- Test anchored forward barcodes
- Test anchored forward barcodes with sequences cut
…ired` with `mixed_orientation`
2acfc02
to
04eeaf6
Compare
Hello @colinvwood, Sorry for the time it took, I totally rewrote my PR over in order to rebase it on main and take into account everything that was said for the PR #57 |
I would also like to see a test proving that anchoring behaves as expected when applied to the reverse reads. |
Co-authored-by: colinvwood <68213641+colinvwood@users.noreply.github.com>
Hey @vaamb just fyi we have a release planned for dec. 11 in case you were trying to get this merged in this cycle |
Hi @colinvwood |
Thanks for all your work on this @vaamb, the tests look good to go. |
71177fb
to
29ba89e
Compare
Follow up of the pull request #57
Allow to anchor barcodes for demultiplexing, which can be useful when used in combination with the
--cut
parameter.Overview of the kind of problems it can solve:
When doing metabarcoding, it is sometimes advised to add a short poly-N (around 5) at the beginning of the sequence to increase Illumina sequence discrimination in the early phases of sequencing.
However, cutadapt does not anchor barcodes by default, meaning that using cutadapt on sequence "RCODEacgt" with the barcode "BARCODE" will result in the demultiplexed sequence "acgt".
Given enough sequences, and the use of numerous short barcodes (such as an Illumina index of 8 bases) this can lead to the wrong assignation of many sequences, hindering downstream processes.
One simple solution is to first remove the poly-N (using the
--cut
option) and using anchored barcodes.While it is possible to anchor this for each barcode individually by adding a
^
before the barcode, this can be cumbersome. This pull request uses cutadapt^file:*
option to anchor all the barcodes at the same time