Skip to content

2. Installation

MikiSchikora edited this page Aug 10, 2023 · 27 revisions

These are the options to install perSVade, which should be possible in any Linux, Mac or Windows OS (depending on the installation option). If you are running it in the BSC (internal use) you can skip this (see Running in BSC clusters).

Option 1: Singularity image (recommended)

We created a docker image which generates a container with perSVade installed. You can use singularity to run this docker image without root permissions (i.e. in a cluster). You can find more info about singularity here. If singularity is not installed, you can get it with:

conda install -c conda-forge singularity

Build a singularity image (stored in a file called mikischikora_persvade_<tag>.sif) obtained from Dockerhub (check this to choose the latest <tag>):

singularity build --docker-login ./mikischikora_persvade_<tag>.sif docker://mikischikora/persvade:<tag>

Check this to troubleshoot the singularity build command.

Once the image file is created it can be moved to other direcories or systems (i.e. HPC clusters). By running this image (i.e. singularity exec -e mikischikora_persvade_<tag>.sif <commands>) you create a container (like a virtual machine inside your computer) that contains the content of this github repository (under /perSVade/, which is the working directory) and all the environment ready to use perSVade. For example, if you type singularity exec -e mikischikora_persvade_<tag>.sif ls /perSVade you'll see how the container's working directory has the same structure as this github repository.

As another example, you can run a command to print the arguments of perSVade with:

singularity exec -e mikischikora_persvade_<tag>.sif bash -c 'source /opt/conda/etc/profile.d/conda.sh && conda activate perSVade_env && python /perSVade/scripts/perSVade --help'

Note that the source /opt/conda/etc/profile.d/conda.sh && conda activate perSVade_env is necessary to activate the environment in which perSVade runs.

You may also enter the container interactively with singularity shell ./mikischikora_persvade_<tag>.sif for testing purposes. We have tested this image on singularity versions 3.7.1 and 3.7.3, and other users have used also 3.8.6.

PROS:

  • Singularity images can be run in any Linux or Mac OS in a reproducible way.
  • The reading and writing of files into the actual filesystem of singularity is better than docker.
  • Singularity images can be run on any Linux system without root permission.
  • The singularity image is a file that can be transfered between machines.

CAVEATS:

  • Singularity images do not have optimal reproducibility in Windows systems.

Option 2: Docker image

We created a docker image (see https://www.docker.com for installing docker) which can generate a container with perSVade installed. Once you have docker running in your computer you can install the image from https://hub.docker.com/r/mikischikora/persvade with:

docker pull mikischikora/persvade:<tag>

This will generate an image called mikischikora/persvade:<tag>, which you can see with docker images. By running this image (i.e. docker run -i mikischikora/persvade:<tag> <commands>) you create a container (like a virtual machine inside your computer) that contains the content of this github repository (under /perSVade/, which is the working directory) and all the environment ready to use perSVade. For example, if you type docker run -i mikischikora/persvade:<tag> ls you'll see how the container's working directory has the same structure as this github repository. Note that you can change the <tag> by any other available 'tag' that may be found in the dockerhub website.

As an example, the command below would output all the options of perSVade:

docker run -i mikischikora/persvade:<tag> scripts/perSVade --help

NOTE: The image of perSVade takes around 19Gb of disk. This may be a problem if your docker writes files in a disk with low storage capacity (which can happen in some Linux systems). You can check this solution to solve it: https://stackoverflow.com/a/56126715.

PROS:

  • Docker images can be run in any OS in a reproducible way.

CAVEATS:

  • Running docker requires root permissions, so that some users may have problems running perSVade with the docker image in the HPC clusters.
  • The sharing of data between the container and the host machine is not automatic, and you'll need to specify mounting points with (the -v option) for defining inputs of the pipeline and storing the outputs.
  • You need to know a bit of docker to view the logs of the pipeline.

Option 3: Traditional installation

You can also install all the dependencies of perSVade on your own, which is a more tedious and error-prone process (as the process may not be reproducible across all machines). To do so, download the perSVade source code from one of the releases and decompress:

wget https://github.com/Gabaldonlab/perSVade/archive/<version>.tar.gz

tar -xvf <version>.tar.gz; rm <version>.tar.gz

This already contains all the scripts to run the pipeline. Note that the created file (for example perSVade-v0.9) will be referred as <perSVade_dir>. You should use the latest version.

perSVade is written in python, R and bash for Linux. We have containerized the dependencies inside several conda environments, and you can install all of them with the following commands:

  • cd <perSVade_dir> # move inside the directory of perSVade

  • conda install -y -c conda-forge mamba=0.15.3 # install mamba to have faster conda installations

  • export PERSVADE_ENV_NAME=<env_name> # set an environmental variable called 'PERSVADE_ENV_NAME' with the name of perSVade's main environment

  • mamba env create --file installation/perSVade_env.yml --name $PERSVADE_ENV_NAME # create the main perSVade env

  • conda activate $PERSVADE_ENV_NAME # activate the environment

  • ./installation/install_external_software.sh # downloads lowess, gztool, gridss, clove and CONY

  • mamba env create --file installation/RepeatMasker_env.yml --name $PERSVADE_ENV_NAME'_RepeatMasker_env' # create the RepeatMasker env

  • ./installation/adapt_RepeatMasker_env.sh # modifies the $PERSVADE_ENV_NAME'_RepeatMasker_env' to have a correct format

  • ./installation/install_Ninja.sh # installs Ninja and puts the 'Ninja' binary into envs/$PERSVADE_ENV_NAME'_RepeatMasker_env'/bin/Ninja

  • mamba env create --file installation/bcftools_1.10.2_env.yml --name $PERSVADE_ENV_NAME'_bcftools_1.10.2_env' # create the env to run bcftools

  • mamba create --name $PERSVADE_ENV_NAME'_aligners_env' -c bioconda segemehl=0.3.4 bowtie2=2.5.1 hisat2=2.2.1 # create env with various aligners

  • mamba env create --file installation/ete3_env.yml --name $PERSVADE_ENV_NAME'_ete3_3.0.0_env' # create the env to run ete3

  • mamba env create --file installation/R_env.yml --name $PERSVADE_ENV_NAME'_R_env' # create the env to run R

  • mamba env create --file installation/CONY_env.yml --name $PERSVADE_ENV_NAME'_CONY_env' # environment to run CONY

  • mamba env create --file installation/AneuFinder_env.yml --name $PERSVADE_ENV_NAME'_AneuFinder_env' # env for AneuFinder

  • mamba env create --file installation/HMMcopy_env.yml --name $PERSVADE_ENV_NAME'_HMMcopy_env' # env for HMMcopy

  • mamba env create --file installation/gridss_env.yml --name $PERSVADE_ENV_NAME'_gridss_env' # env for gridss

  • mamba env create --file installation/picard_env.yml --name $PERSVADE_ENV_NAME'_picard_env'

  • chmod -R 755 ./ # give permissions to all files

Note that all these steps are essential and they should be run in this order. For example, mamba should be installed from the conda base environment and not from the $PERSVADE_ENV_NAME environment. If any step fails you should solve the errors before moving to the next step.

We have tested this installation on several machines, but we can't guarantee smooth installation in all systems because there are a lot of dependencies. If you find errors you may modify the yml files and/or reproduce the behavior of the install_external_software.sh, adapt_RepeatMasker_env.sh or install_Ninja.sh scripts to fit your machine. You may also ask for our help by raising an issue.

We note that this was tested with conda 4.8.0 on a Linux-x86 64-bit architecture, installed at 07/2022.

As an example, the command below would output all the options of perSVade (once it is installed):

conda activate $PERSVADE_ENV_NAME && ./scripts/perSVade --help

PROS:

  • You don't need to know singularity or docker.

CAVEATS:

  • The installation may not be easy on your machine, since different conda versions may not be able to generate the perSVade_env. In addition, the installation of extra dependencies may be difficult.
  • It won't work on Mac and windows.
  • The reproducibility cannot be guaranteed 100%, since conda relies on system libraries that may be different across machines.