Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anchor barcodes during demultiplexing #58

Merged
merged 17 commits into from
Dec 5, 2023
Merged

Conversation

vaamb
Copy link
Contributor

@vaamb vaamb commented Jul 13, 2023

Follow up of the pull request #57

Allow to anchor barcodes for demultiplexing, which can be useful when used in combination with the --cut parameter.

Overview of the kind of problems it can solve:
When doing metabarcoding, it is sometimes advised to add a short poly-N (around 5) at the beginning of the sequence to increase Illumina sequence discrimination in the early phases of sequencing.
However, cutadapt does not anchor barcodes by default, meaning that using cutadapt on sequence "RCODEacgt" with the barcode "BARCODE" will result in the demultiplexed sequence "acgt".
Given enough sequences, and the use of numerous short barcodes (such as an Illumina index of 8 bases) this can lead to the wrong assignation of many sequences, hindering downstream processes.
One simple solution is to first remove the poly-N (using the --cut option) and using anchored barcodes.
While it is possible to anchor this for each barcode individually by adding a ^ before the barcode, this can be cumbersome. This pull request uses cutadapt ^file:* option to anchor all the barcodes at the same time

@lizgehret lizgehret requested a review from colinvwood July 20, 2023 17:24
@colinvwood
Copy link
Contributor

colinvwood commented Jul 31, 2023

Hello @vaamb,

Thanks for the contribution. I want to make I understand the use case here. Does the following represent the type of situation you're trying to cover?

  1. barcodes are placed (e.g.) at the 5' end of a sequence
  2. a poly-N sequence is placed upstream of the barcode: 5' NNN BARCODE SEQUENCE 3'
  3. trimming the barcode using cutadapt's standard 5' end trim is too risky because barcodes are short and may be a couple of mismatches away from a sub sequence of the true biological read, possibly resulting in mis trimming or mis demultiplexing
  4. anchoring is thus desirable (this PR)
  5. to anchor however, we need cut to get the N's out of the way so that the barcode actually begins the sequence (other PR)

@lizgehret
Copy link
Member

Hey @vaamb - we'll need to bump this to next release if we don't hear back from you by EOD tomorrow. Thanks!

@vaamb
Copy link
Contributor Author

vaamb commented Oct 11, 2023

Hello @colinvwood,

This is indeed the use case I have.
The other PR (the cut one) fixed most of my mis-demultiplexing issues as the "NNNNN" at the beginning of the sequence were interpreted by cutadapt as possible barcodes. This PR allowed to further reduced the mis-demultiplexing issue (although to a lesser extent) as well as improve the processing speed.

For the record, barcodes were anchored by default in cutadapt prior to version 2.0

@vaamb vaamb marked this pull request as draft October 23, 2023 09:31
- Allow to anchor all the barcodes with a single option without having to add '^' to each barcode in the metadata file

- It is possible to individually anchor barcodes for forward and reverse barcodes
- Test anchored forward barcodes
- Test anchored forward barcodes with sequences cut
@vaamb vaamb marked this pull request as ready for review November 24, 2023 23:42
@vaamb
Copy link
Contributor Author

vaamb commented Nov 24, 2023

Hello @colinvwood,

Sorry for the time it took, I totally rewrote my PR over in order to rebase it on main and take into account everything that was said for the PR #57
I think I added enough tests but should you need some more, don't hesitate to tell me.

@colinvwood
Copy link
Contributor

I would also like to see a test proving that anchoring behaves as expected when applied to the reverse reads.

vaamb and others added 2 commits December 1, 2023 07:36
Co-authored-by: colinvwood <68213641+colinvwood@users.noreply.github.com>
@colinvwood
Copy link
Contributor

Hey @vaamb just fyi we have a release planned for dec. 11 in case you were trying to get this merged in this cycle

@vaamb
Copy link
Contributor Author

vaamb commented Dec 5, 2023

Hi @colinvwood
I added a test for forward and reverse anchoring and another one to make sure that reverse barcode is required when anchoring in the case of non mixed orientation

q2_cutadapt/_demux.py Outdated Show resolved Hide resolved
q2_cutadapt/_demux.py Outdated Show resolved Hide resolved
@colinvwood
Copy link
Contributor

Thanks for all your work on this @vaamb, the tests look good to go.

@colinvwood colinvwood merged commit c78541c into qiime2:dev Dec 5, 2023
4 checks passed
@vaamb vaamb deleted the anchor_adapters branch December 5, 2023 21:43
@colinvwood colinvwood self-assigned this Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

3 participants