Skip to content

Commit

Permalink
Merge branch 'release/2.0.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
sb43 committed Oct 10, 2018
2 parents 22adccb + b3d3734 commit 6f382ac
Show file tree
Hide file tree
Showing 28 changed files with 4,522 additions and 153 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ htmlcov
/build
/dist
*/segmentation/__pycache__/
cgpCRISPRcleanR.egg-info*
*.log
/results.tsv
/.cache*
Expand All @@ -21,4 +20,5 @@ cgpCRISPRcleanR.egg-info*
/tmp*
.nfs*
.idea
.pytest_cache*
pyCRISPRcleanR.egg-info/
8 changes: 2 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,10 @@ before_install: # installs R3.3
language: python

python:
- "3.6"
- "3.6.1"

install:
# - pip install --upgrade setuptools # to avoid egg_info erro code 1 exit
- pip install pytest
- pip install pytest-cover
- pip install tzlocal
- pip install .
- pip install -r requirements.txt

before_script:
- curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
Expand Down
7 changes: 7 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# CHANGES

### 2.0.0
* Added parallel version of BAGEL
* Added MAGeCK
* Html results summary and links to download data
* GPL3 LICENSE added
* Additional plotly figures

## 1.1.2
* updated appropriate types [int, str etc.,] for commandline inputs

Expand Down
674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

97 changes: 80 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# pyCRISPRCleanR
# pyCRISPRcleanR
| Master | Develop |
| --------------------------------------------------- | ----------------------------------------------------- |
| [![Master Badge][travis-master-badge]][travis-repo] | [![Develop Badge][travis-develop-badge]][travis-repo] |
Expand All @@ -10,7 +10,7 @@ correction of gene independent cell responses to CRISPR-cas9 targeting

- [Design](#design)
- [Tools](#tools)
- [pyCRISPRCleanR](#pycrisprcleanr)
- [pyCRISPRcleanR](#pyCRISPRcleanR)
- [inputFormat](#inputformat)
- [outputFormat](#outputformat)
- [INSTALL](#install)
Expand All @@ -21,6 +21,7 @@ correction of gene independent cell responses to CRISPR-cas9 targeting
- [Setup VirtualEnv](#setup-virtualenv)
- [Cutting a release](#cutting-a-release)
- [Install via `.whl` (wheel)](#install-via-whl-wheel)
- [Reference](# reference)

<!-- /TOC -->

Expand All @@ -29,9 +30,9 @@ Uses DNAcopy R pcakage to perform CBS[ Circular Binary Segmentation of count da

## Tools

`pyCRISPRCleanR` has multiple commands, listed with `pyCRISPRCleanR --help`.
`pyCRISPRcleanR` has multiple commands, listed with `pyCRISPRcleanR --help`.

### pyCRISPRCleanR
### pyCRISPRcleanR

Takes the input count data, library file and other associated files/parameters
The output is tab separated files for normalised fold changes and
Expand All @@ -48,33 +49,37 @@ Various exceptions can occur for malformed input files.

### outputFormat

following tab separated output files were produced
```results.html``` file is generated in the user supplied output folder.
This file contains short description and links for all the result files/folders generated during an analysis workflow.

1. normalised_counts.tsv
#### Tab separated output files
[please note the number prefix to a file name are in the order of files generated by script and help with grouping similar files]:

1. 01_normalised_counts.tsv
* sgRNA: guideRNA
* gene: gene name as defined in the library file
* <control sample count:normalised 1..N> : Normalised count
* <treatment sample count: normalised 1..N> : Normalised count

2. normalised_fold_changes.tsv
2. 02_normalised_fold_changes.tsv
* sgRNA: guideRNA
* gene: gene name as defined in the library file
* <treatment sample fold chages: fold changes 1..N>
* avgFC: average fold change values

3. crispr_cleanr_corrected_counts.tsv [ generated only when ```--crispr_cleanr``` flag is set ]
3. 03_crispr_cleanr_corrected_counts.tsv [ generated only when ```--crispr_cleanr``` flag is set ]
* sgRNA: guideRNA
* gene: gene name as defined in the library file
* <control sample count:corrected 1..N> : corrected count
* <treatment sample count:corrected 1..N >: corrected count

4. crispr_cleanr_fold_changes.tsv [ generated only when ```--crispr_cleanr``` flag is set ]
4. 04_crispr_cleanr_fold_changes.tsv [ generated only when ```--crispr_cleanr``` flag is set ]
* sgRNA: guideRNA
* gene: gene name as defined in the library file
* <treatment sample fold chages: fold changes 1..N>
* avgFC: average fold change values

5. alldata.tsv [ generated only when ```--crispr_cleanr``` option is selected ]
5. 05_alldata.tsv [ generated only when ```--crispr_cleanr``` flag is set ]
* sgRNA: guideRNA
* <control sample count: raw 1..N> : raw count
* <treatment sample count: raw 1..N> : raw count
Expand All @@ -93,22 +98,70 @@ Various exceptions can occur for malformed input files.
* <treatment sample fold chages: fold changes 1..N> (postfixed _cf)
* avgFC_cf: average fold change values based on corrected counts

6. mageckOut [ generated only whem ``` --run_mageck``` flag is set, produces folder containing mageck output for normalised and/or CRISPRcleanR corrected counts]

7. bagelOut [ generated only whem ``` --run_bagel``` flag is set, produces folder containing bagel output for normalised and/or CRISPRcleanR corrected counts]

### [Plotly] and pdf plots

1. plots based on raw sgRNA counts
* 01_raw_counts_boxplot.html
* 01_raw_counts_histogram.html
* 01_raw_counts_correlation_matrix.html

2. plots based on normalised sgRNA counts
* 02_normalised_counts_boxplot.html
* 02_normalised_counts_histogram.html
* 02_normalised_counts_correlation_matrix.html

3. plots based on fold changes
* 03_fold_changes_boxplot.html
* 03_fold_changes_histogram.html
* 03_fold_changes_correlation_matrix.html

4. stats plots: precision recall and ROC curves based on known tru positive sgRNA/gene set
[generated only when ```--gene_signatures``` flag is set]
* 04_pr_rc_curve_sgRNA.html
* 04_roc_curve_sgRNA.html
* 05_pr_rc_curve_gene.html
* 05_roc_curve_gene.html
* 06_depletion_profile_genes.html

5. plots based on CRISPRcleanR corrected counts
* 07_CRISPRcleanR_corrected_count_boxplot.html
* 07_CRISPRcleanR_corrected_count_histogram.html
* 07_CRISPRcleanR_corrected_count_correlation_matrix.html

6. plots based on CRISPRcleanR corrected fold chnages
* 08_CRISPRcleanR_corrected_fold_changes_boxplot.html
* 08_CRISPRcleanR_corrected_fold_changes_histogram.html
* 08_CRISPRcleanR_corrected_fold_changes_correlation_matrix.html

7. 09_Raw_vs_postCRISPRcleanR_segmentation_fold_changes.pdf [generated only when ```--crispr_cleanr``` flag is set]

8. Other informative plots
* 10_density_plots_pre_and_post_CRISPRcleanR.html [generated only when ```--crispr_cleanr``` flag is set]
* 11_impact_on_phenotype_barchart.html [generated only when ```--run_mageck``` flag is set]
* 11_impact_on_phenotype_piechart.html [generated only when ```--run_mageck``` flag is set]

## INSTALL
Installing via `pip install`. Simply execute with the path to the compiled 'whl' found on the [release page][pyCRISPRCleanR-releases]:
Installing via `pip install`. Simply execute with the path to the compiled 'whl' found on the [release page][pyCRISPRcleanR-releases]:

```bash
pip install pyCRISPRCleanR.X.X.X-py3-none-any.whl
pip install pyCRISPRcleanR.X.X.X-py3-none-any.whl
```

Release `.whl` files are generated as part of the release process and can be found on the [release page][pyCRISPRCleanR-releases]
Release `.whl` files are generated as part of the release process and can be found on the [release page][pyCRISPRcleanR-releases]

### Package Dependancies

`pip` will install the relevant dependancies, listed here for convenience:
`pip` will install the relevant dependancies, listed here for convenience, please refer requirements.txt for versions:
* [NumPy]
* [Pandas]
* [rpy2]
* [plotly]
* [MAGeCK]
* [SciPy]

### R packages

Expand Down Expand Up @@ -185,18 +238,28 @@ Install .whl

```bash
# this creates an wheel archive which can be copied to a deployment location, e.g.
scp dist/pyCRISPRCleanR.X.X.X-py3-none-any.whl user@host:~/wheels
scp dist/pyCRISPRcleanR.X.X.X-py3-none-any.whl user@host:~/wheels
# on host
pip install --find-links=~/wheels pyCRISPRCleanR
pip install --find-links=~/wheels pyCRISPRcleanR
```

### Reference
Iorio F, Behan FM, Gonçalves E, Bhosle SG, Chen E, Shepherd R, Beaver C,
Ansari R, Pooley R, Wilkinson P, Harper S, Butler AP, Stronach EA, Saez-Rodriguez
J, Yusa K, Garnett MJ. Unsupervised correction of gene-independent cell responses
to CRISPR-Cas9 targeting. BMC Genomics. 2018 Aug 13;19(1):604. doi:
10.1186/s12864-018-4989-y.

<!--refs-->
[NumPy]: http://www.numpy.org/
[plotly]: https://plot.ly/python/
[MAGeCK]: https://sourceforge.net/projects/mageck/
[SciPy]: https://www.scipy.org
[Pandas]: http://pandas.pydata.org/
[rpy2]: https://rpy2.bitbucket.io/
[DNAcopy]: https://www.bioconductor.org/packages/release/bioc/html/DNAcopy.html
[CRISPRcleanR]: https://github.com/francescojm/CRISPRcleanR
[travis-master-badge]: https://travis-ci.org/cancerit/pyCRISPRcleanR.svg?branch=master
[travis-develop-badge]: https://travis-ci.org/cancerit/pyCRISPRcleanR.svg?branch=develop
[travis-repo]: https://travis-ci.org/cancerit/pyCRISPRcleanR
[pyCRISPRCleanR-releases]: https://github.com/cancerit/pyCRISPRcleanR/releases
[pyCRISPRcleanR-releases]: https://github.com/cancerit/pyCRISPRcleanR/releases
2 changes: 1 addition & 1 deletion Rsupport/libInstall.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ ipak <- function(pkg){
sapply(pkg, library, character.only = TRUE)
}

biocPackages <- c("DNAcopy")
biocPackages <- c("DNAcopy","pROC","PRROC","graphics")
ipak(biocPackages)
8 changes: 5 additions & 3 deletions pyCRISPRcleanR/abstractCrispr.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,18 @@ class AbstractCrispr(ABC):
def __init__(self, **kwargs):
self.countfile = kwargs['countfile']
self.libfile = kwargs['libfile']
self.expname = kwargs.get('expname', 'myexperiment')
self.minreads = kwargs.get('minreads', 30)
self.mingenes = kwargs.get('mingenes', 3)
self.outdir = kwargs.get('outdir')
self.ncontrols = kwargs.get('ncontrols', 1)
self.sample = kwargs.get('sample', 'mySample')
self.ignored_genes = kwargs.get('ignored_genes', [])
self.runcrispr = kwargs.get('crispr_cleanr', None)
self.num_processors = kwargs.get('num_processors', 1)
self.plot_data = kwargs.get('plot_data', None)
self.run_mageck = kwargs.get('run_mageck', None)
self.run_bagel = kwargs.get('run_bagel', None)
self.numiter = kwargs.get('numiter', 1000)
self.gene_sig_dir = kwargs.get('gene_signatures', None)
self.results_cfg = kwargs.get('results_cfg', None)
super().__init__()

@abstractmethod
Expand Down
28 changes: 19 additions & 9 deletions pyCRISPRcleanR/config/logging.conf
Original file line number Diff line number Diff line change
@@ -1,34 +1,44 @@
[loggers]
keys=root,pyCRISPRcleanR
keys=root,pyCRISPRcleanR,log02

[handlers]
keys=consoleHandler,fh01
keys=fh01,hand02

[formatters]
keys=simpleFormatter
keys=simpleFormatter,form02

[logger_root]
level=DEBUG
handlers=consoleHandler
level=NOTSET
handlers=fh01,hand02

[logger_pyCRISPRcleanR]
level=DEBUG
handlers=fh01
qualname=pyCRISPRcleanR
propagate=0
propagate=1

[handler_fh01]
class=FileHandler
level=DEBUG
formatter=simpleFormatter
args=('log_pyCRISPRcleanR.log','w')

[handler_consoleHandler]
[logger_log02]
level=INFO
handlers=hand02
propagate=1
qualname=compiler.parser

[handler_hand02]
class=StreamHandler
level=DEBUG
formatter=simpleFormatter
level=INFO
formatter=form02
args=(sys.stdout,)

[formatter_simpleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s:%(module)s:%(funcName)s:LINE#:%(lineno)d
datefmt=%Y-%m-%d %H:%M:%S

[formatter_form02]
format=%(asctime)s - %(levelname)s - %(message)s
datefmt=%Y-%m-%d %H:%M:%S
64 changes: 64 additions & 0 deletions pyCRISPRcleanR/config/results.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"header" : ["<!DOCTYPE html> <html>",
"<head><style> #results {{border-collapse: collapse; width: 100%;}}",
"#results td, #results th {{ border: 1px solid #ddd; padding: 8px;}}",
"#results tr:nth-child(even){{background-color: #f2f2f2;}}",
"#results tr:hover {{background-color: #ddd;}}",
"#results th {{padding-top: 12px; padding-bottom: 12px; text-align: left; background-color: lightblue; color: black;}} </style></head>",
"<body> <h2>CRISPRcleanR Analysis Results [ <a href=\" ../{outdir}/{file_name}\">Download results.tar.bz2</a> ]</h2> <table style=\"width:100%\" id=\"results\">"],
"table_header": "<tr><th>{}</th><th>{}</th><th>Description</th></tr>",

"table_row_images" : "<tr><td>{count}</td> <td><a href=\" ../{outdir}/{file_name}\" target=\"_blank\" >{file_name}</a></td><td>{description}</td></tr>",
"table_row_files" : "<tr><td>{count}</td> <td><a href=\" ../{outdir}/{file_name}\">{file_name}</a></td><td>{description}</td></tr>",
"table_row_folders" : "<tr><td>{count}</td> <td><a href=\" ../{outdir}/{file_name}\">{file_name}</a></td><td>{description}</td></tr>",

"intermediate_row" : "<tr><th>{}</th></tr>",

"footer" : "</table> </body> </html>",

"images" : {
"01_raw_counts_boxplot.html" : ["Raw sgRNA counts"],
"01_raw_counts_correlation_matrix.html" : ["Raw sgRNA counts"],
"01_raw_counts_histogram.html" : ["Raw sgRNA counts"],
"02_normalised_counts_boxplot.html" : ["Normalised sgRNA counts (median-ratio normalisation of raw counts)"],
"02_normalised_counts_correlation_matrix.html": ["Normalised sgRNA counts (median-ratio normalisation of raw counts)"],
"02_normalised_counts_histogram.html" : ["Normalised sgRNA counts (median-ratio normalisation of raw counts)"],
"03_fold_changes_boxplot.html" : ["sgRNA fold changes with respect to plasmid sgRNA normalised count)"],
"03_fold_changes_correlation_matrix.html" :[ "sgRNA fold changes with respect to plasmid sgRNA normalised count"],
"03_fold_changes_histogram.html" : ["sgRNA fold changes with respect to plasmid sgRNA normalised count"],
"04_pr_rc_curve_sgRNA.html" : ["Precision/Recall(PrRc) evaluation curve quantifying the performances in classifying the considered reference sets based on their logFCs"],
"04_roc_curve_sgRNA.html" : ["ROC curve quantifying the performances in classifying the user defined reference sets based on their logFCs"],
"05_pr_rc_curve_gene.html" : ["Precision/Recall(PrRc) evaluation curve quantifying the performances in classifying the considered reference sets based on their logFCs"],
"05_roc_curve_gene.html" :["ROC curve quantifying the performances in classifying the user defined reference sets based on their logFCs"],
"06_depletion_profile_genes.html" : ["Shows visual inspection of enrichments of predefined sets of core-fitness essential genes",
"near the top of the genome wide essentiality profiles (composed of sgRNA or gene depletion logFCs ranked in increasing order),",
"and to compute their classification recall at a fixed FDR(e.g., 5%)"],

"07_CRISPRcleanR_corrected_count_boxplot.html" : ["Corrected counts are calculated using a unsupervised approach and correcting chromosomal segments of",
"equal sgRNA log fold-changes if they include sgRNAs targeting at least 3 different",
"genes, and without making any assumption on gene essentiality, nor knowing",
"a priori the copy number status of the included genes"],
"07_CRISPRcleanR_corrected_count_correlation_matrix.html" : ["see description for 07_CRISPRcleanR_corrected_count_boxplot.html "],
"07_CRISPRcleanR_corrected_count_histogram.html" : ["see description for 07_CRISPRcleanR_corrected_count_boxplot.html"],
"08_CRISPRcleanR_corrected_fold_changes_boxplot.html" : ["see description for 07_CRISPRcleanR_corrected_count_boxplot.html"],
"08_CRISPRcleanR_corrected_foldchanges_correlation_matrix.ht" : ["see description for 07_CRISPRcleanR_corrected_count_boxplot.html"],
"08_CRISPRcleanR_corrected_fold_changes_histogram.html" : ["see description for 07_CRISPRcleanR_corrected_count_boxplot.html"],
"09_Raw_vs_postCRISPRcleanR_segmentation_fold_changes.pdf" :["one plot per chromosome, with segments of sgRNAs' equal log fold-change before and after the correction"],
"10_density_plots_pre_and_post_CRISPRcleanR.html" : ["Shows the variation induced by the CRISPRcleanR correction on thelogFCs’",
"distributions of sgRNAs targeting defined sets of genes prior/post CRISPRcleanR correction"],
"11_impact_on_phenotype_barchart.html" : ["Evaluates the effect of the CRISPRcleanR correction on the genes showing a significant loss/gain-of-fitness",
"effect (fitness genes) in the uncorrected data, a comparison of fitness gene sets (computed with MAGeCK) before/after CRISPRcleanR correction"],
"11_impact_on_phenotype_piechart.html" : ["see description for 11_impact_on_phenotype_barchart.html"]
},
"files" : {
"01_normalised_counts.tsv" : ["Normalised sgRNA counts (median-ratio normalisation of raw counts)"],
"02_normalised_fold_changes.tsv" : ["sgRNA fold changes with respect to plasmid sgRNA normalised count"],
"03_crispr_cleanr_corrected_counts.tsv" : ["see description for 07_CRISPRcleanR_corrected_count_boxplot.html"],
"04_crispr_cleanr_fold_changes.tsv" : ["see description for 07_CRISPRcleanR_corrected_count_boxplot.html"],
"05_alldata.tsv" : ["Includes all data along with filtered raw counts"]
},
"folders" : {
"mageckOut" : ["results from MAGeCK algorithm "],
"bagelOut" : ["results from BAGEL algorithm"]
}
}
Loading

0 comments on commit 6f382ac

Please sign in to comment.