From 7658a803ed8dcd43c24d2cd63c29bca88ca2e17d Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Mon, 16 Apr 2018 09:51:51 -0700 Subject: [PATCH 01/10] Updates to documentation for initial setup --- treeshop.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/treeshop.md b/treeshop.md index b8e230f..366277c 100644 --- a/treeshop.md +++ b/treeshop.md @@ -1,6 +1,6 @@ # Treeshop Cluster Processing -To process multiple samples through the [Treehouse pipelines Makefile](https://github.com/UCSC-Treehouse/pipelines/blob/master/Makefile) we use [docker-machine](https://docs.docker.com/machine/overview/) to spin up a cluster of machines on Openstack and a simple [Fabric](http://www.fabfile.org/) file to control the compute. +To process multiple samples through the [Treehouse pipelines Makefile](https://github.com/UCSC-Treehouse/pipelines/blob/master/Makefile) we use [docker-machine](https://docs.docker.com/machine/overview/) to spin up a cluster of machines on Openstack and a simple [Fabric](http://www.fabfile.org/) file to control the compute. ## Requirements @@ -26,15 +26,20 @@ Clone this repository: git clone https://github.com/UCSC-Treehouse/pipelines.git -Create a folders that match the [Treehouse storage layout](https://github.com/UCSC-Treehouse/pipelines/blob/master/fabfile.py#L12): +Create needed directory and navigate into the newly cloned repository: + + mkdir .aws + cd pipelines + +Create folders that match the [Treehouse storage layout](https://github.com/UCSC-Treehouse/pipelines/blob/master/fabfile.py#L12): mkdir -p treeshop/primary/original/TEST treeshop/downstream -Copy the TEST fastq samples into the storage hierarchy - +Copy the TEST fastq samples into the storage hierarchy: + cp samples/*.fastq.gz treeshop/primary/original/TEST/ -Spin up a single cluster machine: +Spin up a single cluster machine (make sure you have created your SSH key): fab up From 5846d5bf45ff765f052553eb5e84fc8266e7fa93 Mon Sep 17 00:00:00 2001 From: e-t-k Date: Mon, 30 Apr 2018 11:27:48 -0700 Subject: [PATCH 02/10] treeshop.md - specify .aws is in homedir --- treeshop.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/treeshop.md b/treeshop.md index 366277c..3aab76e 100644 --- a/treeshop.md +++ b/treeshop.md @@ -28,7 +28,7 @@ Clone this repository: Create needed directory and navigate into the newly cloned repository: - mkdir .aws + mkdir ~/.aws cd pipelines Create folders that match the [Treehouse storage layout](https://github.com/UCSC-Treehouse/pipelines/blob/master/fabfile.py#L12): From ae02cee765ed0f1a4f7747a8cf445de70eaf24b3 Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Mon, 7 May 2018 11:14:04 -0700 Subject: [PATCH 03/10] Add ssh key generation instructions --- treeshop.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/treeshop.md b/treeshop.md index 3aab76e..47ed03f 100644 --- a/treeshop.md +++ b/treeshop.md @@ -22,6 +22,34 @@ sense of if things are going smoothly. ## Getting Started +### Prerequisites + +#### SSH Key + +You will need a SSH key in order to use your machine. +If you know for certain that you have a SSH key, skip ahead to the Set Up section. + +To check if you already have a SSH key, type: + + cd ~/.ssh + +If the output is "No such file or directory", then you do not have any SSH keys. +If you successfully navigated to the directory, check that your key exists already: + + ls id_* + +If your keys show up you can move on to the set up. +If not, you can create your SSH key: + + ssh-keygen -t rsa + +Press enter to save the key to the default directory. +Press enter again to skip giving your SSH key a passphrase. + +Congratulations, you are now ready to set up your docker-machine. + +### Set Up + Clone this repository: git clone https://github.com/UCSC-Treehouse/pipelines.git From 8150c150d2f9760088e546086ee21fc7985b2db6 Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Mon, 7 May 2018 11:31:51 -0700 Subject: [PATCH 04/10] Change output format --- treeshop.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/treeshop.md b/treeshop.md index 47ed03f..e1397d7 100644 --- a/treeshop.md +++ b/treeshop.md @@ -117,13 +117,15 @@ Process the samples in manifest.tsv with source and destination under the treesh Output: - [10.50.102.245] Executing task 'process' - Warning: run() received nonzero return code 1 while executing 'docker stop $(docker ps -a -q)'! - Warning: run() received nonzero return code 1 while executing 'docker rm $(docker ps -a -q)'! - [10.50.102.245] put: /scratch/username/pipelines/Makefile -> /mnt/Makefile - 10.50.102.245 processing TEST + [10.50.102.245] Executing task 'process' + Warning: run() received nonzero return code 1 while executing 'docker stop $(docker ps -a -q)'! + Warning: run() received nonzero return code 1 while executing 'docker rm $(docker ps -a -q)'! + [10.50.102.245] put: /scratch/username/pipelines/Makefile -> /mnt/Makefile + 10.50.102.245 processing TEST + ...lot and lots of output... - Done. + + Done. After this you should have the following under downstream: From 17a8362bcf744fecc292ccb72d0e71ee747c2bd2 Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Mon, 7 May 2018 12:43:30 -0700 Subject: [PATCH 05/10] Add PATH setup --- treeshop.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/treeshop.md b/treeshop.md index e1397d7..23cc7fd 100644 --- a/treeshop.md +++ b/treeshop.md @@ -24,6 +24,26 @@ sense of if things are going smoothly. ### Prerequisites +#### PATH + +If you are a new user, you may need to set up your PATH. +You can do this by editing your .bashrc file (example uses VIM, feel free to use your favorite text editor). +From your home directory type: + + vim .bashrc + +In the text editor copy and paste: + + #!/user/bin/env bash + + #echo mypath=$PATH + export PATH=$HOME/bin:$PATH + export PATH=$PATH:/pod/pstore/groups/treehouse/sratoolkit/sratoolkit.2.8.2-1-centos_linux64/bin/:/scratch/ + export PATH=$HOME/.local/bin:$PATH + +Make sure to change the in the second export statement to your user name. +Press `ESC`, then type `:wq` to save and quit. + #### SSH Key You will need a SSH key in order to use your machine. From 03db183509555cb87b2b5f8866b60de65ce312f3 Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Mon, 7 May 2018 13:50:02 -0700 Subject: [PATCH 06/10] Add Shut Down section --- treeshop.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/treeshop.md b/treeshop.md index 23cc7fd..632bfe4 100644 --- a/treeshop.md +++ b/treeshop.md @@ -66,6 +66,10 @@ If not, you can create your SSH key: Press enter to save the key to the default directory. Press enter again to skip giving your SSH key a passphrase. +#### Installing docker-machine + +*Information on installing docker-machine* + Congratulations, you are now ready to set up your docker-machine. ### Set Up @@ -192,6 +196,17 @@ After this you should have the following under downstream: ├── methods.json └── mini.ann.vcf +### Shut Down + +After confirming that you successfully processed your data, you may want to shut down your docker machine. +This will free up resources and space for other Treehouse and Genomics Institute users. +To do this you will need the name of the docker machine you want to shut down (type `docker-machine ls` for a list of machines). +Then type: + + docker-machine rm [machine name] + +Press `y` to confirm deletion. If you used floating IPs you may need to log into openstack in order to release them. + ## Notes Error output with respect to finding and copying files will be written to error.log. All of the output for all machines running in parallel will end up in log.txt. As a result if there are internal errors to the pipelines you'll need to sort through log.txt. From 2936bee8a334e154e6ca3562f218185f81a39fbf Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Mon, 7 May 2018 14:01:36 -0700 Subject: [PATCH 07/10] Add openstack username and password commands --- treeshop.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/treeshop.md b/treeshop.md index 632bfe4..3d1a60f 100644 --- a/treeshop.md +++ b/treeshop.md @@ -41,7 +41,15 @@ In the text editor copy and paste: export PATH=$PATH:/pod/pstore/groups/treehouse/sratoolkit/sratoolkit.2.8.2-1-centos_linux64/bin/:/scratch/ export PATH=$HOME/.local/bin:$PATH -Make sure to change the in the second export statement to your user name. +You will also need to save your openstack cluster credentials. Copy and paste: + + # treeshop + export OS_USERNAME= + export OS_PASSWORD= + +Make sure to change the in the second export PATH statement as well as entering your user name and password in the treeshop section. +(You can press `i` in VIM to enter insert mode for writing in your user name and password). + Press `ESC`, then type `:wq` to save and quit. #### SSH Key From 6c5d3711071996ea01c508b605530af2af522f6d Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Mon, 7 May 2018 15:48:00 -0700 Subject: [PATCH 08/10] Add docker-machine and Floating IPs documentation --- treeshop.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/treeshop.md b/treeshop.md index 3d1a60f..3ef888b 100644 --- a/treeshop.md +++ b/treeshop.md @@ -76,7 +76,10 @@ Press enter again to skip giving your SSH key a passphrase. #### Installing docker-machine -*Information on installing docker-machine* +From the home directory (type `cd` to get to home directory) type: + + curl -L https://github.com/docker/machine/releases/download/v0.14.0/docker-machine-`uname -s`-`uname -m` > ~/docker-machine + install ~/docker-machine ~/bin/docker-machine Congratulations, you are now ready to set up your docker-machine. @@ -213,7 +216,14 @@ Then type: docker-machine rm [machine name] -Press `y` to confirm deletion. If you used floating IPs you may need to log into openstack in order to release them. +Press `y` to confirm deletion. + +You may need to log into openstack in order to release the Floating IPs you used. +While connected to the VPN, log into Openstack via http://podcloud.pod/. +On the left side of the screen, under the 'Compute' menu, click on 'Access & Security'. +In the middle of the screen, under **Access & Security**, click on the 'Floating IPs' tab. +Under the 'Mapped Fixed IP Address' column (if you click the column name, the rows will be sorted alphabetically), if you see a row containing 'None XX.XXX.X.XX' (X stands for a number 0-9), click the box on the left hand column of the row. +Then click the red 'Release Floating IPs' box in the top right area of the table. ## Notes From 427d5811fc248bd238de428ad301d5334daf6701 Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Thu, 31 May 2018 12:04:13 -0700 Subject: [PATCH 09/10] "Add e-t-k and klearned comments" --- treeshop.md | 40 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/treeshop.md b/treeshop.md index 3ef888b..442f059 100644 --- a/treeshop.md +++ b/treeshop.md @@ -94,6 +94,8 @@ Create needed directory and navigate into the newly cloned repository: mkdir ~/.aws cd pipelines +### Processing the test sample + Create folders that match the [Treehouse storage layout](https://github.com/UCSC-Treehouse/pipelines/blob/master/fabfile.py#L12): mkdir -p treeshop/primary/original/TEST treeshop/downstream @@ -146,7 +148,7 @@ Output: [10.50.102.245] out: Done. -Process the samples in manifest.tsv with source and destination under the treeshop folder sending log output to the console and log.txt: +Process the samples in manifest.txt with source and destination under the treeshop folder sending log output to the console and log.txt: fab process:manifest=manifest.txt,base=treeshop 2>&1 | tee log.txt @@ -211,13 +213,24 @@ After this you should have the following under downstream: After confirming that you successfully processed your data, you may want to shut down your docker machine. This will free up resources and space for other Treehouse and Genomics Institute users. -To do this you will need the name of the docker machine you want to shut down (type `docker-machine ls` for a list of machines). + +#### All machines + +To shut down all docker-machines type: + + fab down + +#### Select machines + +To select which machines you want to shut down you will need the name of the docker machine you want to shut down (type `docker-machine ls` for a list of machines). Then type: docker-machine rm [machine name] Press `y` to confirm deletion. +#### Free Floating IPs + You may need to log into openstack in order to release the Floating IPs you used. While connected to the VPN, log into Openstack via http://podcloud.pod/. On the left side of the screen, under the 'Compute' menu, click on 'Access & Security'. @@ -231,6 +244,25 @@ Error output with respect to finding and copying files will be written to error. Treeshop is a cheap and cheerful option to process 10's to up to 100 samples at a time. Larger scale projects will require a more sophisticated distributed computing approach. If you are not comfortable ssh'ng into various machines, running docker, and scp'ng results around then you may want to find someone that is before trying Treeshop. +To set up multiple machines to process large amounts of samples you can give the `fab up` command a numeric variable input. +For example, to spin up 5 machines type: + + fab up:5 + +When processing multiple samples you will need to format your manifest.txt appropriately. +Each sample name will need to be placed on a separate line. +For example: + + 1. TEST1 + 2. TEST2 + 3. TEST3 + etc. + +The fabfile will automatically assign the docker-machines samples to run. + +WARNING: When running `fab process`, it will automatically stop currently running docker-machines in order to work on the newly assigned samples. +Either make sure your docker-machines have finished processing their samples or restrict which machines are available by using the hosts parameter. [Fabfile hosts](http://docs.fabfile.org/en/1.14/usage/execution.html#globally-via-the-command-line). + While running 'fab top' will show you what dockers are running on each machine. After an initial delay copying the fastqs over you should see the alpine running (calculating md5) and then rnaseq. @@ -243,3 +275,7 @@ quite a bit of extra provenance by writing methods.json files as well as organiz per the Treehouse storage layout. That said if you have some custom additional pipelines you want to run its fairly easy to just add another target to the Makefile and then copy/paste inside of the fabfile.py process method. + +If using multiple versions of the fabfile, you can select which version to use via the -f flag: + + fab -f process:manifest=manifest.txt,base=treeshop 2>&1 | tee log.txt From 7d711e04776aaaf18df642d94d8841aec1ce10a7 Mon Sep 17 00:00:00 2001 From: Geoff Lyle Date: Tue, 5 Jun 2018 10:35:29 -0700 Subject: [PATCH 10/10] Move options to advanced section --- treeshop.md | 93 +++++++---------------------------------------------- 1 file changed, 12 insertions(+), 81 deletions(-) diff --git a/treeshop.md b/treeshop.md index 442f059..7221dde 100644 --- a/treeshop.md +++ b/treeshop.md @@ -22,58 +22,6 @@ sense of if things are going smoothly. ## Getting Started -### Prerequisites - -#### PATH - -If you are a new user, you may need to set up your PATH. -You can do this by editing your .bashrc file (example uses VIM, feel free to use your favorite text editor). -From your home directory type: - - vim .bashrc - -In the text editor copy and paste: - - #!/user/bin/env bash - - #echo mypath=$PATH - export PATH=$HOME/bin:$PATH - export PATH=$PATH:/pod/pstore/groups/treehouse/sratoolkit/sratoolkit.2.8.2-1-centos_linux64/bin/:/scratch/ - export PATH=$HOME/.local/bin:$PATH - -You will also need to save your openstack cluster credentials. Copy and paste: - - # treeshop - export OS_USERNAME= - export OS_PASSWORD= - -Make sure to change the in the second export PATH statement as well as entering your user name and password in the treeshop section. -(You can press `i` in VIM to enter insert mode for writing in your user name and password). - -Press `ESC`, then type `:wq` to save and quit. - -#### SSH Key - -You will need a SSH key in order to use your machine. -If you know for certain that you have a SSH key, skip ahead to the Set Up section. - -To check if you already have a SSH key, type: - - cd ~/.ssh - -If the output is "No such file or directory", then you do not have any SSH keys. -If you successfully navigated to the directory, check that your key exists already: - - ls id_* - -If your keys show up you can move on to the set up. -If not, you can create your SSH key: - - ssh-keygen -t rsa - -Press enter to save the key to the default directory. -Press enter again to skip giving your SSH key a passphrase. - #### Installing docker-machine From the home directory (type `cd` to get to home directory) type: @@ -148,9 +96,9 @@ Output: [10.50.102.245] out: Done. -Process the samples in manifest.txt with source and destination under the treeshop folder sending log output to the console and log.txt: +Process the samples in manifest.tsv with source and destination under the treeshop folder sending log output to the console and log.txt: - fab process:manifest=manifest.txt,base=treeshop 2>&1 | tee log.txt + fab process:manifest=manifest.tsv,base=treeshop 2>&1 | tee log.txt Output: @@ -212,32 +160,12 @@ After this you should have the following under downstream: ### Shut Down After confirming that you successfully processed your data, you may want to shut down your docker machine. -This will free up resources and space for other Treehouse and Genomics Institute users. - -#### All machines +This will free up resources and space for other users. To shut down all docker-machines type: fab down -#### Select machines - -To select which machines you want to shut down you will need the name of the docker machine you want to shut down (type `docker-machine ls` for a list of machines). -Then type: - - docker-machine rm [machine name] - -Press `y` to confirm deletion. - -#### Free Floating IPs - -You may need to log into openstack in order to release the Floating IPs you used. -While connected to the VPN, log into Openstack via http://podcloud.pod/. -On the left side of the screen, under the 'Compute' menu, click on 'Access & Security'. -In the middle of the screen, under **Access & Security**, click on the 'Floating IPs' tab. -Under the 'Mapped Fixed IP Address' column (if you click the column name, the rows will be sorted alphabetically), if you see a row containing 'None XX.XXX.X.XX' (X stands for a number 0-9), click the box on the left hand column of the row. -Then click the red 'Release Floating IPs' box in the top right area of the table. - ## Notes Error output with respect to finding and copying files will be written to error.log. All of the output for all machines running in parallel will end up in log.txt. As a result if there are internal errors to the pipelines you'll need to sort through log.txt. @@ -249,7 +177,7 @@ For example, to spin up 5 machines type: fab up:5 -When processing multiple samples you will need to format your manifest.txt appropriately. +When processing multiple samples you will need to format your manifest.tsv appropriately. Each sample name will need to be placed on a separate line. For example: @@ -260,10 +188,11 @@ For example: The fabfile will automatically assign the docker-machines samples to run. -WARNING: When running `fab process`, it will automatically stop currently running docker-machines in order to work on the newly assigned samples. -Either make sure your docker-machines have finished processing their samples or restrict which machines are available by using the hosts parameter. [Fabfile hosts](http://docs.fabfile.org/en/1.14/usage/execution.html#globally-via-the-command-line). +WARNING: Running `fab process` will automatically stop all currently running docker-machines in order to work on the newly assigned samples. +Make sure your docker-machines have finished processing their samples. +Users comfortable with changing commands may wish to learn how to restrict which machines are used to process samples by using the hosts parameter. [Fabfile hosts](http://docs.fabfile.org/en/1.14/usage/execution.html#globally-via-the-command-line). -While running 'fab top' will show you what dockers are running on each machine. After an initial +While running `fab top` will show you what dockers are running on each machine. After an initial delay copying the fastqs over you should see the alpine running (calculating md5) and then rnaseq. The first sample on a fresh machine will cause all the docker's to be pulled, later samples will be @@ -276,6 +205,8 @@ per the Treehouse storage layout. That said if you have some custom additional p run its fairly easy to just add another target to the Makefile and then copy/paste inside of the fabfile.py process method. -If using multiple versions of the fabfile, you can select which version to use via the -f flag: +#### Advanced options + +Users seeking more information on using multiple fabfiles or using different options should visit the Fabric website. [Fabric options](http://docs.fabfile.org/en/1.14/usage/fab.html). - fab -f process:manifest=manifest.txt,base=treeshop 2>&1 | tee log.txt +For more information on selectively shutting down docker-machines review the docker-machine documentation. [docker-machine](https://docs.docker.com/machine/reference/rm/).