Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cat fastq feature #179

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,12 @@ On release, automated continuous integration tests run the pipeline on a full-si
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Generate reference indices ([`yara`](https://www.seqan.de/apps/yara.html))
3. Map reads to reference ([`yara`](https://www.seqan.de/apps/yara.html))
4. Run HLA typing ([`OptiType`](https://github.com/FRED-2/OptiType))
5. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
3. Generate reference indices ([`yara`](https://www.seqan.de/apps/yara.html))
4. Map reads to reference ([`yara`](https://www.seqan.de/apps/yara.html))
5. Run HLA typing ([`OptiType`](https://github.com/FRED-2/OptiType))
6. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))

## Usage

Expand Down
12 changes: 11 additions & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,17 @@
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$"
},
{
"type": "string",
"maxLength": 0
}
]
},
"fastq_2": {
"type": "string",
Expand Down
9 changes: 8 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ process {
ext.args2 = "solver=${params.solver}"
}

withName: 'CAT_FASTQ' {
publishDir = [
path: { params.save_merged_fastq ? "${params.outdir}/fastq" : params.outdir },
mode: params.publish_dir_mode,
saveAs: { filename -> (filename.endsWith('.fastq.gz') && params.save_merged_fastq) ? filename : null }
]
}

withName: 'MULTIQC' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand All @@ -52,5 +60,4 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

}
22 changes: 22 additions & 0 deletions conf/test_fastq_cat.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running fastq tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines bundled input files and everything required to run a fast and simple test.

Use as follows:
nextflow run nf-core/hlatyping -profile test,<docker/singularity> --outdir <OUTDIR>
----------------------------------------------------------------------------------------
*/

params {
config_profile_name = 'fastq Test Profile'
config_profile_description = 'Test dataset with multiple fastqs from same sample.'

max_cpus = 2
max_memory = '6.GB'
max_time = '48.h'

input = 'https://raw.githubusercontent.com/nf-core/test-datasets/hlatyping/samplesheets/samplesheet_fastq_cat.csv'
solver = 'glpk'
}
13 changes: 13 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,24 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [cat](#cat) - Merge re-sequenced FastQ files
- [FastQC](#fastqc) - Raw read QC
- [OptiType](#optitype) - HLA genotyping based on integer linear programming
- [MultiQC](#multiqc) - Aggregate report describing results from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### cat

<details markdown="1">
<summary>Output files</summary>

- `fastq/`
- `*.merged.fastq.gz`: If `--save_merged_fastq` is specified, concatenated FastQ files will be placed in this directory.

</details>

If multiple libraries/runs have been provided for the same sample in the input samplesheet (e.g. to increase sequencing depth) then these will be merged at the very beginning of the pipeline in order to have consistent sample naming throughout the pipeline. Please refer to the [usage documentation](https://nf-co.re/rnaseq/usage#samplesheet-input) to see how to specify these samples in the input samplesheet.

### FastQC

<details markdown="1">
Expand Down
10 changes: 5 additions & 5 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,15 @@ You will need to create a samplesheet with information about the samples you wou
--input '[path to samplesheet file]'
```

### Multiple runs
### Multiple runs of the same sample

The `sample` identifiers have to be specified with the `fastq` files and the sequencing type:
The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes. Concatenation is only supported for `fastq` files, not `BAM` files.

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,seq_type
CONTROL_PE,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,dna
CONTROL_SE,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,dna
CONTROL_PE,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,rna
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,rna
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,rna
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,rna
```

### Full samplesheet
Expand Down
10 changes: 10 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"cat/fastq": {
"branch": "master",
"git_sha": "0e9cb409c32d3ec4f0d3804588e4778971c09b7e",
"installed_by": ["modules"]
},
"custom/dumpsoftwareversions": {
"branch": "master",
"git_sha": "8022c68e7403eecbd8ba9c49496f69f8c49d50f0",
"installed_by": ["modules"]
},
"fastqc": {
"branch": "master",
"git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd",
Expand Down
10 changes: 10 additions & 0 deletions modules/nf-core/cat/fastq/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

81 changes: 81 additions & 0 deletions modules/nf-core/cat/fastq/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 45 additions & 0 deletions modules/nf-core/cat/fastq/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading