Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orient-seqs: accept FASTQ data as input #159

Open
nbokulich opened this issue Jul 13, 2023 · 4 comments
Open

orient-seqs: accept FASTQ data as input #159

nbokulich opened this issue Jul 13, 2023 · 4 comments

Comments

@nbokulich
Copy link
Collaborator

vsearch --orient can accept FASTQ as input (and also output via the --fastqout option). Ideally, orient-seqs (which is just thinly wrapping vsearch --orient) could do the same.

This would require modifying the inputs/outputs here: https://github.com/bokulich-lab/RESCRIPt/blob/master/rescript/orient.py#L55

HOWEVER, the main issue I see is that the current inputs and outputs are DNAFASTAFormat objects. A FASTQ-formatted input (e.g., coming from some of the SampleData[.*Sequence.*] types) could not have DNAFASTAFormat as a view type. I suppose we need something like a Union[SingleLanePerSamplePairedEndFastqDirFmt | DNAFASTAFormat | ... ] as input and output, and a TypeMap in the plugin registration to accept and output the corresponding types.

@colinvwood
Copy link

Links to motivating forum posts here and here for reference.

@mirand863
Copy link
Contributor

mirand863 commented Jul 24, 2023

Hi,

Thank you very much for opening this issue! I made some progress today. It is working with a single end FASTQ file using the type MultiplexedSingleEndBarcodeInSequence and I believe it would not be too much trouble to allow for other types, i.e., paired end and multiple samples inside a folder. However, I am currently running into an error is not complete type expression with the TypeMap. I will try to debug more another day to solve this error.

Best regards,
Fabio

@VinzentRisch
Copy link
Contributor

Hi @nbokulich
I do not understand how this would work.
How I understood the vsearch functionality is that if fasta files are given as input, fasta files are returned as output. And it is the same for fastq files.
It is not possible in qiime to have different directory formats for the same output. So the only way I see is to have one output of format DNAFASTAFormat and one of CasavaOneEightSingleLanePerSampleDirFmt and depending on the input one of the outputs would be empty.
Or am I missing something here?

@mikerobeson
Copy link
Collaborator

mikerobeson commented Dec 23, 2024

Hi @VinzentRisch, I think the easiest solution would be to create a separate action specific for handling fastq files, and rename the current action. That is, have two actions, something like orient-seqs-fasta & orient-seqs-fastq. This would bypass the i/o format issues.

I guess my question would be, how to deal with paired-end fastq inputs. Do we just map the forward reads and keep track of which of these is oriented, and then reverse compliment the reverse reads? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants