02_Tutorial_Pipeline

Overview

There are two locations where important directories/files are located:

/data/RBL_NCI/Pipelines/Talon_Flair/
- Includes the directory for current pipeline versions
/data/CCBR_Pipeliner/Talon_Flair/testing
- Includes the following files:
  
  ├── fastq_files │   ├── barcode01.fastq │   ├── barcode02.fastq │   └── barcode03.fastq
  
  ├── manifests │   ├── deg_example.tsv │   └── sample_example.tsv
  
  └── snakemake_configs │   ├── snakemake_config_1.yaml │   ├── snakemake_config_2.yaml │   └── snakemake_config_3.yaml

Preparation

You'll use a tagged version of the pipeline stored on RBL_NCI to run this tutorial, and will load Snakemake to run the pipeline.

Get the pipeline

Use the tagged version on RBL_NCI

#review available versions, select the version you want to run
ls /data/RBL_NCI/Pipelines/Talon_Flair/v*

#Move to the directory
cd /data/RBL_NCI/Pipelines/Talon_Flair/[version selected]/RBL_RBL3/build/create_masked_refs/

Load Snakemake

Load Snakemake to your environment.

# Recommend running snakemake>=5.19
module load snakemake

Initialize the pipeline and output directory

This step will create an output directory and move required config files to directory for editing. The pipeline takes two parameters:

Usage: run_pipeline.sh -p pipeline
	-p options: initialize, cluster, local, dry-run, unlock, report
Usage:  -o output_dir
	-o path to output directory

You must first initialize your directory:

sh run_pipeline.sh -p initialize -o /path/to/output/

Prepare the required inputs

There are three required inputs for each run: 1) snakemake_config.yaml 2) samples.tsv manifest 2) deg_manifest.tsv. Files are already configured for testing as follows:

Test1 - Three samples, no cleanup, no masking
- This dataset will run three samples without cleaning up the sample (clean_transcripts="N") or masking the reference (masked_reference="N").
Test2 - Small dataset, no cleanup, masking
- This dataset will run three samples with cleaning up the sample (clean_transcripts="N") but does not mask the reference (masked_reference="Y").
Test3 - Small dataset, cleanup, masking
- This dataset will run three samples with cleaning up the sample (clean_transcripts="Y") and masking the reference (masked_reference="Y").

Copy the snakemake config

Determine which test you are to run, and execute the appropriate command. Note that the only file that must be copied is the snakemake_config file. The sample manifest and deg manifest will be automatically added to your workflow.

#if running Test1
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_1.yaml /path/to/output/dir/snakemake_config.yaml

#if running Test2
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_2.yaml /path/to/output/dir/snakemake_config.yaml

#if running Test3
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_3.yaml /path/to/output/dir/snakemake_config.yaml

Edit the snakemake config

Open and edit the snakemake_config.yaml file stored in your /path/to/output/dir/ and edit the following parameter:

#path to output directory
- output_dir: "/path/to/output/dir/"

Run the Pipeline

Dry-Run

sh run_pipeline.sh -p dry -o /path/to/output/dir/

Execute pipeline: A) on the cluster

sh run_pipeline.sh -p cluster -o /path/to/output/dir/

B) locally

sh run_pipeline.sh -p local -o /path/to/output/dir/

Unlock directory (after failed partial run)

sh run_pipeline.sh -p unlock -o  -o /path/to/output/dir/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly