Table of Contents
echo $(cat <file.fastq>|wc -l)/4|bc
awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}' <file.fastq>
Compute average read length from FASTQ or FASTA file. Python script here.
Use:
python average_read_length.py <file_name>
Convert2PacBioBAM converts FASTA and FASTQ files to PacBio BAM files. To use the IsoSeq v3 pipeline, a file with the reads must be provided in PacBio BAM format, which carries quality information in a specific way. Users cannot always access the IsoSeq3 workflows because the scripts do not accept reads in FASTA or FASTQ format as input. To put a temporary patch to this problem, you can use the convert_to_pacbio_bam.py
script. Given your reads in FASTA or FASTQ format and a real PacBio BAM file generated with css
(also supports BAM files after using lima
and isoseq refine
), it will "convert" your sequences into a PacBio BAM file using the input file as a template.
- Python (3.7)
- samtools
With the --help option you can display a complete description of the arguments:
usage: convert_to_pacbio_bam.py [-h] [--keep_files] reads bam output
Reformats FASTA and FASTQ files to PacBio BAM files
positional arguments:
reads Read sequences in FASTA or FASTQ format
bam PacBio real BAM to use as template
output Output path and prefix for converted BAM file
(".converted.bam" suffix will be added")
optional arguments:
-h, --help show this help message and exit
--keep_files If used the program will not delete intermediate files
Running this tool will look as follows:
python convert_to_pacbio_bam.py reads.fasta PacBio.flnc.bam reads.out
cat <file.gtf> | awk '$3 == "transcript" { print $12 }' | sed 's/"//' | sed 's/";//'
or
cat <file.gtf> | awk '$3 == "transcript" { gsub(/"/, "", $12); gsub(/;/, "", $12); print $12 }'
To use the trans_to_chrom
R function to get transcript IDs and corresponding chromosomes, follow these steps:
-
Download the R script trans_to_chrom.R from this repository.
-
Load the script in your R environment:
source("path_to_trans_to_chrom.R") result <- trans_to_chrom("file.gtf")
The result
variable will contain the mapping of transcripts to chromosomes.
Make sure you have the required R packages installed before using the function.
Get transcript length and GC content of transcripts from GTF file.
-
Download the R script gc_content_and_length.R from this repository.
-
Load the script in your R environment:
source("gc_content_and_length.R") result <- gc_content_and_length("reference.fa", "file.gtf")
Make sure you have the required R packages installed before using the function.