Skip to content

getiria-onsongo/hadoop-cnvrf-public

Repository files navigation

Hadoop-CNV-RF: A Scalable Copy Number Variation Detection Tool for Next-Generation Sequencing Data

This is a Hadoop implementation of CNV-RF a copy number variation (CNV) detection method capableof detecting clinically relevant CNVs at scale. This Hadoop based implementation can rapidly scale to analyze large datasets such as whole-exome data.

Prerequisites

These instructions assume you have a hadoop cluster up and running with dependency software installed. If you have your own cluster, see installing dependency software for instruction on how to install required software. If you are using Amazon's Elastic Map Reduce framework, we provide an image (Amazon Machine Image) with dependency software installed. See launching hadoop on Amazon using EMR for instructions on launching a Hadoop cluster on Amazon with dependency software installed.

Prepare reference genome

If you have used BWA and Bowtie2 before and already have genome indices, create a folder in your master node and save these indices in that folder. Compress the folder and note the location of this compressed folder. If using an EMR cluster, we recommend saving it in /mnt e.g., /mnt/hg19/hg19_index.tar.gz.

If you have no experience with BWA and Bowtie2, see preparing reference genome for instruction on indexing the hg19 human genome for BWA and Bowtie2.

Installing

Once you have a Hadoop cluster up and running with dependency software installed, get a copy of Hadoop-CNV-RF. If using an EMR cluster, navigate to the mounted drive first. The root drive has limited disk space.

  • SSH into your machine.
$ ssh -i myPrivateKey.pem hadoop@xxx.us-west-2.compute.amazonaws.com 
  • Navigate to mounted drive.
$ cd /mnt
  • Get a copy of Hadoop-CNV-RF.
$ git clone https://github.com/getiria-onsongo/hadoop-cnvrf-public.git 
  • Navigate into program folder.
$ cd hadoop-cnvrf-public
  • Navigate into the folder contain test data and uncompress the data.
$ cd data/sample/
$ gunzip *.gz
$ cd ../control/
$ gunzip *.gz
  • Navigate back to program directory.
$ cd /mnt/hadoop-cnvrf-public/
  • Launch the analysis.
$ ./run_script.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published