nf-core · SpikyClip · Oct 18, 2022 · Jan 23, 2025 · Jan 25, 2025 · Jan 25, 2025
diff --git a/README.md b/README.md
@@ -31,11 +31,12 @@ On release, automated continuous integration tests run the pipeline on a full-si
 <!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
      workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples.   -->
 
-1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
-2. Generate reference indices ([`yara`](https://www.seqan.de/apps/yara.html))
-3. Map reads to reference ([`yara`](https://www.seqan.de/apps/yara.html))
-4. Run HLA typing ([`OptiType`](https://github.com/FRED-2/OptiType))
-5. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
+1. Merge re-sequenced FastQ files ([`cat`](http://www.linfo.org/cat.html))
+2. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
+3. Generate reference indices ([`yara`](https://www.seqan.de/apps/yara.html))
+4. Map reads to reference ([`yara`](https://www.seqan.de/apps/yara.html))
+5. Run HLA typing ([`OptiType`](https://github.com/FRED-2/OptiType))
+6. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
 
 ## Usage
 

diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -18,7 +18,17 @@
                 "format": "file-path",
                 "exists": true,
                 "pattern": "^\\S+\\.f(ast)?q\\.gz$",
-                "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
+                "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
+                "anyOf": [
+                    {
+                        "type": "string",
+                        "pattern": "^\\S+\\.f(ast)?q\\.gz$"
+                    },
+                    {
+                        "type": "string",
+                        "maxLength": 0
+                    }
+                ]
             },
             "fastq_2": {
                 "type": "string",

diff --git a/conf/modules.config b/conf/modules.config
@@ -44,6 +44,14 @@ process {
         ext.args2 = "solver=${params.solver}"
     }
 
+    withName: 'CAT_FASTQ' {
+        publishDir = [
+            path: { params.save_merged_fastq ? "${params.outdir}/fastq" : params.outdir },
+            mode: params.publish_dir_mode,
+            saveAs: { filename -> (filename.endsWith('.fastq.gz') && params.save_merged_fastq) ? filename : null }
+        ]
+    }
+
     withName: 'MULTIQC' {
         ext.args   = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
         publishDir = [
@@ -52,5 +60,4 @@ process {
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
     }
-
 }
diff --git a/conf/test_fastq_cat.config b/conf/test_fastq_cat.config
@@ -0,0 +1,22 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running fastq tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines bundled input files and everything required to run a fast and simple test.
+
+    Use as follows:
+        nextflow run nf-core/hlatyping -profile test,<docker/singularity> --outdir <OUTDIR>
+----------------------------------------------------------------------------------------
+ */
+
+params {
+    config_profile_name        = 'fastq Test Profile'
+    config_profile_description = 'Test dataset with multiple fastqs from same sample.'
+
+    max_cpus = 2
+    max_memory = '6.GB'
+    max_time = '48.h'
+
+    input = 'https://raw.githubusercontent.com/nf-core/test-datasets/hlatyping/samplesheets/samplesheet_fastq_cat.csv'
+    solver = 'glpk'
+}
diff --git a/docs/output.md b/docs/output.md
@@ -10,11 +10,24 @@ The directories listed below will be created in the results directory after the
 
 The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
 
+- [cat](#cat) - Merge re-sequenced FastQ files
 - [FastQC](#fastqc) - Raw read QC
 - [OptiType](#optitype) - HLA genotyping based on integer linear programming
 - [MultiQC](#multiqc) - Aggregate report describing results from the whole pipeline
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
+### cat
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `fastq/`
+  - `*.merged.fastq.gz`: If `--save_merged_fastq` is specified, concatenated FastQ files will be placed in this directory.
+
+</details>
+
+If multiple libraries/runs have been provided for the same sample in the input samplesheet (e.g. to increase sequencing depth) then these will be merged at the very beginning of the pipeline in order to have consistent sample naming throughout the pipeline. Please refer to the [usage documentation](https://nf-co.re/rnaseq/usage#samplesheet-input) to see how to specify these samples in the input samplesheet.
+
 ### FastQC
 
 <details markdown="1">

diff --git a/docs/usage.md b/docs/usage.md
@@ -28,15 +28,15 @@ You will need to create a samplesheet with information about the samples you wou
 --input '[path to samplesheet file]'
 ```
 
-### Multiple runs
+### Multiple runs of the same sample
 
-The `sample` identifiers have to be specified with the `fastq` files and the sequencing type:
+The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes. Concatenation is only supported for `fastq` files, not `BAM` files.
 
 ```csv title="samplesheet.csv"
 sample,fastq_1,fastq_2,seq_type
-CONTROL_PE,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,dna
-CONTROL_SE,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,dna
-CONTROL_PE,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,rna
+CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,rna
+CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,rna
+CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,rna
 ```
 
 ### Full samplesheet

diff --git a/modules.json b/modules.json
@@ -5,6 +5,16 @@
         "https://github.com/nf-core/modules.git": {
             "modules": {
                 "nf-core": {
+                    "cat/fastq": {
+                        "branch": "master",
+                        "git_sha": "0e9cb409c32d3ec4f0d3804588e4778971c09b7e",
+                        "installed_by": ["modules"]
+                    },
+                    "custom/dumpsoftwareversions": {
+                        "branch": "master",
+                        "git_sha": "8022c68e7403eecbd8ba9c49496f69f8c49d50f0",
+                        "installed_by": ["modules"]
+                    },
                     "fastqc": {
                         "branch": "master",
                         "git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd",

diff --git a/modules/nf-core/cat/fastq/environment.yml b/modules/nf-core/cat/fastq/environment.yml
diff --git a/modules/nf-core/cat/fastq/main.nf b/modules/nf-core/cat/fastq/main.nf
diff --git a/modules/nf-core/cat/fastq/meta.yml b/modules/nf-core/cat/fastq/meta.yml