Skip to content

02_Tutorial_Pipeline

Samantha edited this page Jul 28, 2021 · 1 revision

Overview

There are two locations where important directories/files are located:

  1. /data/RBL_NCI/Pipelines/Talon_Flair/

    • Includes the directory for current pipeline versions
  2. /data/CCBR_Pipeliner/Talon_Flair/testing

    • Includes the following files:

      ├── fastq_files │   ├── barcode01.fastq │   ├── barcode02.fastq │   └── barcode03.fastq

      ├── manifests │   ├── deg_example.tsv │   └── sample_example.tsv

      └── snakemake_configs │   ├── snakemake_config_1.yaml │   ├── snakemake_config_2.yaml │   └── snakemake_config_3.yaml

Preparation

You'll use a tagged version of the pipeline stored on RBL_NCI to run this tutorial, and will load Snakemake to run the pipeline.

Get the pipeline

  1. Use the tagged version on RBL_NCI
#review available versions, select the version you want to run
ls /data/RBL_NCI/Pipelines/Talon_Flair/v*

#Move to the directory
cd /data/RBL_NCI/Pipelines/Talon_Flair/[version selected]/RBL_RBL3/build/create_masked_refs/

Load Snakemake

Load Snakemake to your environment.

# Recommend running snakemake>=5.19
module load snakemake

Initialize the pipeline and output directory

This step will create an output directory and move required config files to directory for editing. The pipeline takes two parameters:

Usage: run_pipeline.sh -p pipeline
	-p options: initialize, cluster, local, dry-run, unlock, report
Usage:  -o output_dir
	-o path to output directory

You must first initialize your directory:

sh run_pipeline.sh -p initialize -o /path/to/output/

Prepare the required inputs

There are three required inputs for each run: 1) snakemake_config.yaml 2) samples.tsv manifest 2) deg_manifest.tsv. Files are already configured for testing as follows:

  • Test1 - Three samples, no cleanup, no masking
    • This dataset will run three samples without cleaning up the sample (clean_transcripts="N") or masking the reference (masked_reference="N").
  • Test2 - Small dataset, no cleanup, masking
    • This dataset will run three samples with cleaning up the sample (clean_transcripts="N") but does not mask the reference (masked_reference="Y").
  • Test3 - Small dataset, cleanup, masking
    • This dataset will run three samples with cleaning up the sample (clean_transcripts="Y") and masking the reference (masked_reference="Y").

Copy the snakemake config

Determine which test you are to run, and execute the appropriate command. Note that the only file that must be copied is the snakemake_config file. The sample manifest and deg manifest will be automatically added to your workflow.

#if running Test1
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_1.yaml /path/to/output/dir/snakemake_config.yaml

#if running Test2
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_2.yaml /path/to/output/dir/snakemake_config.yaml

#if running Test3
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_3.yaml /path/to/output/dir/snakemake_config.yaml

Edit the snakemake config

Open and edit the snakemake_config.yaml file stored in your /path/to/output/dir/ and edit the following parameter:

  • #path to output directory
    • output_dir: "/path/to/output/dir/"

Run the Pipeline

  1. Dry-Run
sh run_pipeline.sh -p dry -o /path/to/output/dir/
  1. Execute pipeline: A) on the cluster
sh run_pipeline.sh -p cluster -o /path/to/output/dir/

B) locally

sh run_pipeline.sh -p local -o /path/to/output/dir/
  1. Unlock directory (after failed partial run)
sh run_pipeline.sh -p unlock -o  -o /path/to/output/dir/