Skip to content

Assembly and analysis pipeline for twenty-two metagenomic samples from the Southern and Atlantic Ocean.

Notifications You must be signed in to change notification settings

LeonDlugosch/Atlantic-Ocean-Metagenomes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Atlantic Ocean Metagenomes Reference Gene Cathalogue

Assembly and analysis pipeline

The assembly pipeline is written for server-environment of the Institute for Chemistry and Biology of the Marine Environment (ICBM) of the Carl-von-Ossietzky University Oldenburg, Germany as it uses Institute servers as well as the HPC facility of the University of Oldenburg. Third party software is required to run this assembly pipeline (see below).

Downstream analysis such as statistics etc. are included in the analysis directory.

This Page is under construction and a proper documentation will be added in the near future (until the end of 2021).

About the dataset

Fractionated CTD samples from 20m depth were taken on the RV Polarstern cruise ANT28-4 and ANT28-5 from March to May 2012 and metagenomes from 22 stations were generated using Illumina HiSeq2500 (Dlugosch et al. for details).

Station from RV Polarstein cruises ANT28-4 and ANT28-5. Colour indicates average chlorophyll a concentration from 2012.



In short, Illumina sequences were quality trimmed and residual adapter seqeunces were removed using timmomatic and subsequently assembled using metaSPAdes (HPC). Genes from contigs ware predicted using Prodigal, dereplicated and clustered at 95% sequence identity using usearch. Resulting seqeunces were classified taxonomically using Kaiju with the ProGenomes and Refseq databases (seqeunce taxonomy is integrated during dataset generation). Sequences were functionally classified using GhostKOALA and the CAZyme database (diamond blastx --more-sensitive). High quality reads were mappted to the AOM-RGC to using bowtie2.

See here for a more detailed manual of the assembly-pipeline (will follow soon).

For reproducability and transparency all analysis scripts are provided. If you choose run them, they should be executed in sequence from 01 to 06 as some data are generated that are used later during analysis. Custom functions are provided at the start of each script. Installation of additional packages in R might be required. Note that most figures were edited to comply with formatting standards.

Data availability

Illumina seqeuncing data, assemblies, Atlantic Ocean Metagenome gene cathalogue (AOM-RGC), gene abundance tables and environmental data are available here:

Illumina sequencing data
Environmental data
AOM-RGC, gene abundance tables, assemblies and predicted genes

About

Assembly and analysis pipeline for twenty-two metagenomic samples from the Southern and Atlantic Ocean.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published