-
Notifications
You must be signed in to change notification settings - Fork 0
02_Tutorial_Pipeline
There are two locations where important directories/files are located:
-
/data/RBL_NCI/Pipelines/Talon_Flair/
- Includes the directory for current pipeline versions
-
/data/CCBR_Pipeliner/Talon_Flair/testing
-
Includes the following files:
├── fastq_files │ ├── barcode01.fastq │ ├── barcode02.fastq │ └── barcode03.fastq
├── manifests │ ├── deg_example.tsv │ └── sample_example.tsv
└── snakemake_configs │ ├── snakemake_config_1.yaml │ ├── snakemake_config_2.yaml │ └── snakemake_config_3.yaml
-
You'll use a tagged version of the pipeline stored on RBL_NCI to run this tutorial, and will load Snakemake to run the pipeline.
- Use the tagged version on RBL_NCI
#review available versions, select the version you want to run
ls /data/RBL_NCI/Pipelines/Talon_Flair/v*
#Move to the directory
cd /data/RBL_NCI/Pipelines/Talon_Flair/[version selected]/RBL_RBL3/build/create_masked_refs/
Load Snakemake to your environment.
# Recommend running snakemake>=5.19
module load snakemake
This step will create an output directory and move required config files to directory for editing. The pipeline takes two parameters:
Usage: run_pipeline.sh -p pipeline
-p options: initialize, cluster, local, dry-run, unlock, report
Usage: -o output_dir
-o path to output directory
You must first initialize your directory:
sh run_pipeline.sh -p initialize -o /path/to/output/
There are three required inputs for each run: 1) snakemake_config.yaml 2) samples.tsv manifest 2) deg_manifest.tsv. Files are already configured for testing as follows:
- Test1 - Three samples, no cleanup, no masking
- This dataset will run three samples without cleaning up the sample (clean_transcripts="N") or masking the reference (masked_reference="N").
- Test2 - Small dataset, no cleanup, masking
- This dataset will run three samples with cleaning up the sample (clean_transcripts="N") but does not mask the reference (masked_reference="Y").
- Test3 - Small dataset, cleanup, masking
- This dataset will run three samples with cleaning up the sample (clean_transcripts="Y") and masking the reference (masked_reference="Y").
Determine which test you are to run, and execute the appropriate command. Note that the only file that must be copied is the snakemake_config file. The sample manifest and deg manifest will be automatically added to your workflow.
#if running Test1
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_1.yaml /path/to/output/dir/snakemake_config.yaml
#if running Test2
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_2.yaml /path/to/output/dir/snakemake_config.yaml
#if running Test3
cp /data/RBL_NCI/Pipelines/Talon_Flair/testing/snakemake_configs/snakemake_config_3.yaml /path/to/output/dir/snakemake_config.yaml
Open and edit the snakemake_config.yaml file stored in your /path/to/output/dir/ and edit the following parameter:
- #path to output directory
- output_dir: "/path/to/output/dir/"
- Dry-Run
sh run_pipeline.sh -p dry -o /path/to/output/dir/
- Execute pipeline: A) on the cluster
sh run_pipeline.sh -p cluster -o /path/to/output/dir/
B) locally
sh run_pipeline.sh -p local -o /path/to/output/dir/
- Unlock directory (after failed partial run)
sh run_pipeline.sh -p unlock -o -o /path/to/output/dir/