- Description
- General Installation
- Pipeline CLUMPAK(StrAuto) + distruct
- Pipeline Structure_threader
- Pipeline fastSTRUCTURE with Docker
This repository contains three pipelines with scripts to run locally several types of STRUCTURE analysis :
- Pipeline CLUMPAK analysis of STRUCTURE results produced by StrAuto and generate distruct figures.
- Pipeline Structure_threader with fastStructure
- Pipeline fastStructure with Docker
git clone https://github.com/hernanmd/STRUCTUREPipelines.git
cd runstructure
- STRUCTURE input file (.str).
- The default name used in the configuration files is project_data.str
- StrAuto results should be already available in a .zip file
- The results should be zipped into a single .zip file.
- Default name is stresults.zip, with the following structure:
k1.zip
k1/
project_data_k1_run10_f
project_data_k2_run1_f
...
k2.zip
k2/
project_data_k1_run10_f
project_data_k2_run1_f
...
- Put your StrAuto results into a subdirectory
- Edit environment variables in the file rsEnvVars.sh
./rsGetClumpak.sh
- The runStrClumpak script performs the following actions:
- Read environment variables in rsEnvVars.sh as parameters.
- Create the output directory.
- Build the populations file.
- Run the CLUMPAK Perl script
./runStrClumpak
- The following script perform the following actions:
- Read environment variables as parameters.
- Create the output directory.
- Build the populations file.
- Run DistructForManyKs perl script
./runStrDistructForManyKs
WiP
To use this pipeline you should have your input files both in PED/MAP format (to generate the populations file) and in BED/BIM/FAM format (required by fastSTRUCTURE). ALSO the structure input file, which could be generated from PLINK using the "--recode structure" option. It is highly recommended to put the input files in a separate subdirectory. The output directory will be created if not already present.
# Create the required popfile from the PED file.
# The first parameter should be a PED file name which should be specified WITHOUT the .ped extension
# The second parameter should be the species name (as understood by PLINK): cow, horse, etc.
# The output is a new file named "popfile" suitable for Structure_threader plots
./mkPopFile ../STRUCTURE_PIPrun/project_input/file species
- If you have not mainparams and extraparams files in your input directory, then run the ./runFsStrThreader.sh script to generate a template version of both files.
- Edit with your favorite editor:
nano project_input/mainparams
To run Structure_threader you must specify the following parameters
- 1st parameter is the DIRECTORY where input files are located
- 2nd parameter is the BED file (using PED is not valid for now)
- 3rd parameter is the DIRECTORY where output will be written
- 4th parameter is the name of the popfile generated with mkPopFile script.
- 5th parameter is the number of maximum K:
Example:
./runFsStrThreader.sh project_input/ file.bed project_output/ popfile 24
The Structure_threader already generates a plots subdirectory with HTML/SVG paired files into the output directory, however this script will also generate a "Comparative Plots" in a comparativePlotAllKs directory
Install Docker Under Windows: Launch MSYS2 Under Linux/OSX: Launch Terminal Fetch fastStructure docker image from https://hub.docker.com/r/dockerbiotools/faststructure
docker pull dockerbiotools/faststructure
# Get the image id from the following command
docker images
# Make a directory for your dataset
mkdir data # Or whatever your population name is
# Install the winpty package if necessary
# pacman -Ss winpty (or apt get winpty)
# Run the image
winpty docker run -it -v /${PWD}/data/:/fastStructure/data 6ca
# Get the image id from the following command
docker images
# Run the image
docker run -it -v /${PWD}/data/:/fastStructure/data 6ca