Vesalius analysis

This repository contains the code used to generate all the plots in the Vesalius 2.0.0 manuscript.

Disclaimer

All the analysis was carried on a HPC unit running the Slurm workload manager.

We carried out the analyis by submitting jobs as batches. The shell scripts show the batch submission process and the analysis pipeline used in each case.

Note that in some cases, we create intermediate files which are used later. For instance, during benchmarking, the output of each tool is standardized and then aggregated for scoring and plotting.

Input directories contain (with exceptions - explictely mentioned) the data in the same form as downloaded from online repositories.

Analysis - Overview

ARTISTA

This directory contains the analysis related to Axolotl Brain Regeneration

The data can be downloaded from the STOmics data collection

ARISTA_regen => Code used for processing ARTISTA data and mapping between samples.
ARTISTA_plot => Code used to generate mapping plots.
ARTISTA_regen_15_to_20_DPI => Analysis specific to the mapping of 15DPI to 20DPI data sets.

Bash files are files used during submission.

Benchmarking

For this manuscript, we benchmarked our methods against 6 other tools (GPSA, PASTE, SLAT, CytoSpace, Tangram, Scanorama) in synthetic and real data sets.

Below, we present how the synthetic data was generated and how real data was formatted.

Generating Synthetic Data

For all synthetic data sets, we used the oneiric package. The package contains a dedicated function to generate all synthetic data used in our analysis. The oneric diretory in this repository shows how the data sets were created and plots the output.

NOTE: To generate the same data sets, please DO NO change the seed that is provided.

Spatial Data formatting

For the real data, we refromatted real spatial transcriptomics data to ensure a consitent input to our benchmarking code. Specifically, we reformatted:

Each sub-directory shows the reformatting procedure for all data sets. The reformated data is saved in the same directory as the original data and is the expected input of the benchmarking. The formatting consists of splitting data when needed, adjusting coordinates (fix to origin), adding cell type labels (includes deconvolution when required), and cell context (added useing oneiric)

The original publication links are the following:

Benchmarking

This directory contains all the code related to the benchmarking across all tools in both synthetic and real data. We also include a sub-directory containing the code related to result aggregation, scoring, and plotting.

In brief, each tool use 2 main analysis scripts (except CytoSpace which requires further reformatting) with the appropriate extensions (.r or .py):

{tool}_{bench}
{tool}_{bio_spa}

The first script is use during the mapping of synthetic data sets while the second is used during the mapping of real biological data.

We used bash scripts to call these analysis files with the appropriate arguments. If you wish to re-run the analysis, please update the bash scripts with the appropriate path to directories, make sure synthetic data is available, and real data has been properly formatted (see above). As noted in the disclaimer, we used a SLURM engine nomenclature. This can removed or replaced with which ever heading suits your needs.

The bash scripts allow:

Synthetic data benchmarking
Computational performance benchmarking
seqFISH benchmarking
Slide-seq (ssv2) benchmarking
Stereo-seq (stereo) benchmarking

Aggreating, Scoring, and Plotting

We aggregate, score and plot all benchmarking results using a set of bash submission files which will call the appropriate R scripts to perform each task. The general pipeline is following:

Unify mapping scores
Plot Mapping scores
Plot Mapping Event

For synthetic data, there is an additional step:

Plot computational performance

For real data, there is an addition step:

Plot Contribution scores (Vesalius Only)

Big Batch

We also provide a big batch script which will sequentially submit benchmarking tasks across all tools and tasks.

NOTE: this does require all other scripts to have been updated accordingly.

Cancer

This directory contains the analysis related to mapping cells across tumor samples in protate cancer (Slide-seq v2). For simplicity, the r script called by the bash script performs an end-to-end analysis of samples.

Data can be downloaded on the GitHub provided by the authors. The original publications is available here

IMC

This directory contains the analysis related to the in situ Mass Cytometry data

build_data => data selection, filtering, and pre-processing.
RAZA_balence => mapping of samples between each other and export results.
RAZA_heat => plotting and clustering of mapping results.

MOSTA

This directory contains the analysis related to Mouse embryonic development

The data can be downloaded from the STOmics data collection

MOSTA => pre-processing and mapping of embryo data forward in time.
MOSTA_plot => plotting mapping results .
MOSTA_cluster => clustering of mapped cells and DEG analysis .

Tech to Tech

This directory contains the analysis related to cross technology and cross resolution mapping. For this analysis we used data from the following sources:

The "stos" (seqFISH to Stereo-seq) sub-directory contains:

stos => pre-processing and mapping
stos_plot => plotting of mapping results

The "vtov" (VisiumHD to Visium) sub-directory contains:

vtov_annot => RCTD annotation of data sets and pre-processing
vtov => cross resolution mapping
vtov_plot => plotting results

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
ARTISTA		ARTISTA
Cancer		Cancer
IMC		IMC
MOSTA		MOSTA
Oneiric		Oneiric
Spatial_data_sets		Spatial_data_sets
benchmarking		benchmarking
tech_to_tech		tech_to_tech
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vesalius analysis

Disclaimer

Analysis - Overview

ARTISTA

Benchmarking

Generating Synthetic Data

Spatial Data formatting

Benchmarking

Aggreating, Scoring, and Plotting

Big Batch

Cancer

IMC

MOSTA

Tech to Tech

About

Releases

Packages

Languages

WonLab-CS/Vesalius_analysis

Folders and files

Latest commit

History

Repository files navigation

Vesalius analysis

Disclaimer

Analysis - Overview

ARTISTA

Benchmarking

Generating Synthetic Data

Spatial Data formatting

Benchmarking

Aggreating, Scoring, and Plotting

Big Batch

Cancer

IMC

MOSTA

Tech to Tech

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages