Sampling Noise based Inference of Transcription ActivitY : Filtering of Poison noise on a single-cell RNA-seq UMI count matrix
Sanity infers the log expression levels xgc of gene g in cell c by filtering out the Poisson noise on the UMI count matrix ngc of gene g in cell c.
See our publication for more details.
The raw and normalized datasets mentionned in the preprint are available on . Files are named [dataset name]_UMI_counts.txt.gz and [dataset name]_[tool name]_normalization.txt.gz.
The scripts used for running the bechmarked normalization methods and for making the figures of the preprint are in the reproducibility folder.
- Count Matrix: (Ng x Nc) matrix with Ng the number of genes and Nc the number of cells. Format: tab-separated, comma-separated, or space-separated values. (
'path/to/text_file'
)
GeneID | Cell 1 | Cell 2 | Cell 3 | ... |
---|---|---|---|---|
Gene 1 | 1.0 | 3.0 | 0.0 | |
Gene 2 | 2.0 | 6.0 | 1.0 | |
... |
- (Alternatively) Matrix Market File Format: Sparse matrix of UMI counts. Automatically recognized by
.mtx
extension of the input file. Namedmatrix.mtx
by cellranger 2.1.0 and 3.1.0 (10x Genomics). ('path/to/text_file.mtx'
)- (optional) Gene ID file: Named
genes.tsv
by cellranger 2.1.0 andfeatures.tsv
by cellranger 3.1.0 (10x Genomics). ('path/to/text_file'
) - (optional) Cell ID file: Named
barcodes.tsv
by cellranger 2.1.0 and 3.1.0 (10x Genomics). ('path/to/text_file'
)
- (optional) Gene ID file: Named
- (optional) Destination folder (
'path/to/output/folder'
, default:pwd
) - (optional) Number of threads (integer, default:
4
) - (optional) Print extended output (Boolean,
'true', 'false', '1'
or'0'
, default:4
) - (optional) Minimal and maximal considered values of the variance in log transcription quotients (double, default: vmin=
0.001
vmax=50
) - (optional) Number of bins for the variance in log transcription quotients (integer, default:
160
)
-
log_transcription_quotients.txt: (Ng x Nc) table of inferred log expression levels. The gene expression levels are normalized to 1, meaning that the summed expression of all genes in a cell is approximately 1. Note that we use the natural logarithm, so to change the normalization one should multiply the exponential of the expression by the wanted normalization (e.g. mean or median number of captured gene per cell).
GeneID Cell 1 Cell 2 Cell 3 ... Gene 1 0.25 -0.29 -0.54 Gene 2 -0.045 -0.065 0.11 ... -
ltq_error_bars.txt : (Ng x Nc) table of error bars on inferred log expression levels
GeneID Cell 1 Cell 2 Cell 3 ... Gene 1 0.015 0.029 0.042 Gene 2 0.0004 0.0051 0.0031 ...
-
mu.txt : (Ng x 1) vector of inferred mean log expression levels
-
d_mu.txt : (Ng x 1) vector of inferred error bars on mean log expression levels
-
variance.txt : (Ng x 1) vector of inferred variance per gene in log expression levels
-
delta.txt : (Ng x Nc) matrix of inferred log expression levels centered in 0
-
d_delta.txt : (Ng x Nc) matrix of inferred error bars log expression levels centered in 0
-
likelihood.txt : (Ng+1 x Nb) matrix of normalized variance likelihood per gene, with Nb the number of bins on the variance.
Variance 0.01 0.0107 0.0114 ... Gene 1 0.018 0.019 0.020 Gene 2 0.0006 0.0051 0.0031 ...
./Sanity <option(s)> SOURCES
Options:
-h,--help Show this help message
-v,--version Show the current version
-f,--file Specify the input transcript count text file (.mtx for Matrix Market File Format)
-mtx_genes,--mtx_gene_name_file Specify the gene name text file (only needed if .mtx input file)
-mtx_cells,--mtx_cell_name_file Specify the cell name text file (only needed if .mtx input file)
-d,--destination Specify the destination path (default: pwd)
-n,--n_threads Specify the number of threads to be used (default: 4)
-e,--extended_output Option to print extended output (default: false)
-vmin,--variance_min Minimal value of variance in log transcription quotient (default: 0.001)
-vmax,--variance_max Maximal value of variance in log transcription quotient (default: 50)
-nbin,--number_of_bins Number of bins for the variance in log transcription quotient (default: 160)
- Clone the GitHub repository
git clone https://github.com/jmbreda/Sanity.git
-
Install OpenMP library
- On Linux
If not already installed (Check withldconfig -p | grep libgomp
, no output if not installed), do
sudo apt-get update sudo apt-get install libgomp1
- On mac OS using macports
Install thegcc9
package
port install gcc9
Change the first line of
src/Makefile
fromCC=g++
toCC=g++-mp-9
- On mac OS using brew
Install thegcc9
package
brew install gcc9
Change the first line of
src/Makefile
fromCC=g++
toCC=g++-9
- On Linux
-
Move to the source code directory and compile.
cd Sanity/src
make
- The binary file is located in
Sanity/bin/Sanity
- Alternatively, the already compiled binary for macOS is located in
Sanity/bin/Sanity_macOS
Compute cell-cell distances from Sanity output files. Needs extended outputs of Sanity (-e 1
option).
- The output folder of the Sanity run, specifiied with the
-d
option in Sanity ('path/to/folder'
) - (optional) The gene signal to noise ratio used as gene cut-off (double, default:
1.0
) - (optional) Compute distances with or without errorbars (boolean, default:
1
ortrue
) - (optional) Number of threads (integer, default:
4
)
- Cell-cell distance: (Nc(Nc-1)/2) vector of cell to cell distances dist(celli,cellj), i=1,...,Nc-1, j=i+1,...,Nc, with Nc the number of cells.
dist(cell1,cell2) dist(cell1,cell3) dist(cell1,cell4) ... dist(cellNc-2,cellNc-1) dist(cellNc-2,cellNc) dist(cellNc-1,cellNc)
located in the Sanity output folder (specified with -f
option), named cell_cell_distance_[...].txt
, depending on the -err
and -s2n
options.
./Sanity_distance <option(s)> SOURCES
Options:
-h,--help Show this help message
-v,--version Show the current version
-f,--folder Specify the input folder with extended output from Sanity
-s2n,--signal_to_noise_cutoff Minimal signal/noise of genes to include in the distance calculation (default: 1.0)
-err,--with_error_bars Compute cell-cell distance taking the errobar epsilon into account (default: true)
-n,--n_threads Specify the number of threads to be used (default: 4)
Same dependencies as Sanity (see above).
- Move to the source code directory and compile.
cd Sanity/src
make Sanity_distance
- The binary file is located in
Sanity/bin/Sanity_distance
For any questions or assistance regarding Sanity, please post your question in the issues section.