This repository includes all the codes, results and explanations for the identification of the differentially expressed cathepsin B and L genes from figures 5 and 6 in the article "Evolved transcriptional responses and their trade-offs after long-term adaptation of Bemisia tabaci to a marginally-suitable host". These genes are likely to be found at the salivary glands of the whitefly Bemisia tabaci (Gennadius) (Hemiptera: Aleyrodidae) MEAM1.
The following files were obtained from the NCBI for the analysis:
-
RNAseq of salivary glands
Identifiers: BioSample: SAMEA4682676; SRA: ERS2502869
Organism: Bemisia tabaci
Accession: SAMEA4682676 ID: 12215746
-
Identification of Saliva Proteins of the Whitefly Bemisia tabaci by Transcriptome
Identifiers: BioSample: SAMN13381603 ; SRA: SRS5714533
Organism: Bemisia tabaci
Accession: SAMN13381603
All the steps are listed and described below.
- File of 13 cathepsin B and L genes created.
- Simmilrity of genes was accessed.
- The genes blasted against 8,610 salivary gland genes in B.tabaci found in the NCBI.
- Transcriptomic data from salivary glands of B.tabaci blasted in the NCBI online against the genes.
- Transcriptomic data from salivary glands of B.tabaci was assembled using Trinity and genes were predicted using Transdecoder.
- Transcriptomic data from salivary glands of B.tabaci was mapped against the genes.
- The genes were analised for signal peptides (SP) in the webpage signalP.
UGENE, IGV, bowtie2, Trinity and Transdecoder.
Commands used for assembly and gene prediction from transcriptomic data
______________________________________________________________________________
makeblastdb -in Trinity_single.fasta -dbtype nucl -out Trinity_single.fasta
______________________________________________________________________________
______________________________________________________________________________
makeblastdb -in Trinity_paired.fasta -dbtype nucl -out Trinity_paired.fasta
______________________________________________________________________________
______________________________________________________________________________
for i in Trinity_paired.fasta Trinity_single.fasta; do blastn -query $i -db cathepsin.fa -out $i.results.txt -outfmt 6; done
______________________________________________________________________________
Results files can be found at the Results folder:
BLASTN_for_Trinity_paired.fasta.results.txt
BLASTN_for_Trinity_single.fasta.results.txt
The reads where mapped and manually analyzed using the following commands:
- Indexing of the database ($i == Transcripts of cathepsin B and L genes):
______________________________________________________________________________
for i in XM_*.fa; do bowtie2-build $i $i; done
______________________________________________________________________________
- Mapping of all the obtained reads to the refrence database created in step 1:
______________________________________________________________________________
for i in XM_*.fa; do bowtie2 --fast -p 16 -x $i -1 *_R1.fastq.gz -2 *_R2.fastq.gz -U Single_R.fastq.gz -S $i.sam
______________________________________________________________________________
- Converting from SAM to BAM and indexing the BAM for data visualization in IGV:
______________________________________________________________________________
for i in XM_*.sam ; do samtools view -bS $i | samtools sort -o $i.sorted.bam ; done
______________________________________________________________________________
______________________________________________________________________________
for i in XM_*.sorted.bam; do samtools index $i $i.bai; done
______________________________________________________________________________
The output was visualized in IGV and PNG of each gene with his covarge was exported.
Identification of the genes as genes from the salivary gland using trabscriptomic data
Blast against the reads using the NCBI platform and against the predicted genes from the reads were not sensitive enough.
Therfore we analysed the obtained reads from the NCBI mapping them against the 13 genes of cathepsin which resulted in:
1.Seven genes had high and equal coverage.
- One of the references cathepsin genes had small amounts of reads mapping to it but equally distributed over all the gene.
Gene | mRNA | Protein | Reads Coverage |
---|---|---|---|
LOC109042327 | XM_019059005.1 | XP_018914550.1 |
- Five genes were reads mapped poorly and inconsistent.
Identification of the genes as genes from the salivary gland using signal peptides
Genes simmilarity analysis: Information about the precentage of identity between the cathepsin B and L genes.
BLAST output file for step 3 in the general wokflow
BLAST output file for step 5 in the general wokflow
- Repo members or admin