scripts_and_code

Bash and R scripts to analyze Next-Gen sequence data.

Multi_FASTQC.sh

Simple script to create sequence quality reports (FASTQC and MultiQC) in parallel with a single command on a remote HPC

Usage - Place script in same directory as files to be processed and execute specifying the extension of files between quotations. For example:

sbatch Multi_FASTQC.sh "fq.gz"

Multi_FASTQC.sh has been tested in "fq", "fq.gz" and "bam" files.

Open script for more details

base_calculator.sh

This script counts the number of each base in DNA fragments from single-digest Sbf1 RADseq, pair-end fq.gz files according to the window size and region determined by user

Usage:

1.- Place script in the same directory with files to be processed. Open script and:

2.- Set slurm options according to your system

3.- Set "User Variables"

4.- Check that the ls statement in line 69 will list your input files. Modify regex if necessary

5.- Execute in command with: sbatch <script name> <"readDir">

sbatch base_calculator.sh "F"

Output:

TSV table with file names, base counts, and read information.

Open script for more details

base_proportions.R

R script to calculate, and plot, base pair proportions and mean base pair proportion of DNA fragments position by position

Uses the output of base_calculator.sh, or tsv files with base pair counts from single digest RADseq, paired-End sequencing data, as input

Open script for more details

read_calculator.sh

read_caltulator.sh counts the number of reads in compressed (default) or uncompressed FQ files (open script for details).

Usage:

1.- Place script in the same directory with FQ files to be processed. Open script and:

2.- Set slurm options according to your system

3.- Set "User Variables"

4.- Execute

sbatch read_calculator.sh

Output:

CSV table with file names and total number of reads

motif_calculator.sh

motif_calculator.sh identifies and counts repeated motifs in compressed or uncompressed FQ files

Usage:

1.- Place script in the same directory with files to be processed. Open script and:

2.- Set file and read info as well as the maximum length (bp) of motifs to be counted.

motif_calculator.sh will then lists and reports frequencies of all motifs within the size range of "position 1", or the first bp, to the specified maximum length (from beginning of reads only).

User variables options:

FILE_DIRECTION ("forward" or "reverse")

DIRECTION_SUFFIX ("F","R","R1","R2", etc)

FILE_EXTENSION ("fq" or "fq.gz" for uncompressed and compressed files, respectively)

MAX_motif_length (digit)(maximum motif size (in bp) to search for repeats)

THREADS (number of threads according to your system)

Open script for details

fq_repeat_cleaner.sh

fq_repeat_cleaner.sh removes sequences with repeated motifs at the beginning of the read in compressed or uncompressed FQ files

Usage:

1.- Place script in the same directory with files to be processed. Open script and:

2.- Set file and read info, maximum length (bp) of motifs to be counted, and output base name.

3.- Execute and once the motif frequencies are printed in the terminal, enter the the desired motif length to base the read removal.

Output:

One single concatenated FQ file (from all input files) with all reads for which the starting motif of predetermined length does not repeat in any other reads

Open script for details

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Base_prop_Plot_afterRemoval_of_repeatedReads.pdf		Base_prop_Plot_afterRemoval_of_repeatedReads.pdf
Base_prop_Plot_noRemoval_of_repeatedReads.png		Base_prop_Plot_noRemoval_of_repeatedReads.png
Mean_base_prop_example.png		Mean_base_prop_example.png
Multi_FASTQC.sh		Multi_FASTQC.sh
README.md		README.md
base_calculator.sh		base_calculator.sh
base_proportions.R		base_proportions.R
concat_fqFiles_diffLanes.sh		concat_fqFiles_diffLanes.sh
fq_repeat_cleaner.sh		fq_repeat_cleaner.sh
mapDamage_tutorial.md		mapDamage_tutorial.md
motif_calculator.sh		motif_calculator.sh
read_calculator.sh		read_calculator.sh
ssh_config_stay_connected.txt		ssh_config_stay_connected.txt
subsetting_VCF_files.dat		subsetting_VCF_files.dat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scripts_and_code

Multi_FASTQC.sh

base_calculator.sh

base_proportions.R

read_calculator.sh

motif_calculator.sh

fq_repeat_cleaner.sh

concat_fqFiles_diffLanes.sh

subsetting_VCF_files.dat

ssh_config_stay_connected.txt

mapDamage_tutorial.md

About

Releases

Packages

Languages

ericgarciaresearch/Scrips_and_code

Folders and files

Latest commit

History

Repository files navigation

scripts_and_code

About

Resources

Stars

Watchers

Forks

Languages