Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme #9

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
550f4ec
Rough outline
aybray Aug 12, 2024
9981527
Update README.md
aybray Aug 12, 2024
34b3ce3
Update README.md
aybray Aug 12, 2024
003996f
Update README.md
aybray Aug 12, 2024
ca8d0d6
Update README.md
aybray Aug 12, 2024
8fdf189
Update README.md
aybray Aug 12, 2024
d53651b
Update README.md
aybray Aug 12, 2024
a4f6094
Update README.md
aybray Aug 12, 2024
4294354
Update README.md
aybray Aug 12, 2024
e0e9f56
Update README.md
mahisimham Aug 12, 2024
53fb4a6
Update README.md
mahisimham Aug 12, 2024
5ee1f3c
Update README.md
mahisimham Aug 12, 2024
89dd08f
Update README.md
mahisimham Aug 12, 2024
d0b8888
Update README.md
mahisimham Aug 12, 2024
97ce231
Update README.md
mahisimham Aug 12, 2024
c6d94c4
Create foo
mihirsamdarshi Aug 13, 2024
5a7027f
Update README.md
mahisimham Aug 13, 2024
f96510d
Update README.md
rainacpatel Aug 13, 2024
0a87006
Update README.md
mahisimham Aug 13, 2024
15cd976
Update README.md
mahisimham Aug 13, 2024
57894ef
Update README.md
aybray Aug 13, 2024
cf86e6c
Create PIPELINE.md
mahisimham Aug 13, 2024
f64efcd
Update README.md
mahisimham Aug 13, 2024
2c9b065
Update README.md
mahisimham Aug 13, 2024
ac96111
Update README.md
aybray Aug 13, 2024
2a569f9
Update README.md
aybray Aug 13, 2024
f5a6e1d
Update README.md
aybray Aug 13, 2024
06378b7
Update README.md
aybray Aug 13, 2024
b37415b
Update README.md
aybray Aug 13, 2024
b8aca40
Update README.md
aybray Aug 13, 2024
150017a
Update README.md
aybray Aug 13, 2024
60b1ff5
Update README.md
aybray Aug 13, 2024
b4180c2
Merge branch 'develop' into potential-readme
mihirsamdarshi Aug 13, 2024
7ea1cb6
Merge pull request #5 from SPARC-FAIR-Codeathon/potential-readme
mihirsamdarshi Aug 13, 2024
e20854b
Update README.md
aybray Aug 13, 2024
7425efd
Merge branch 'develop' into main-readme-only
mihirsamdarshi Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions PIPELINE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Pipeline Information
This pipeline is the backbone of sPARcRNA_Viz and provides the coordinates required to create the scRNA-seq visualizations.

## Input
It takes the barcodes, features, and matrix files as inputs. The files need to either be in .csv/.tsv and .mtx format or in an R data format.

## Output
A json file with all the coordinates of the points in a tSNE that is used by the frontend to visualize it in an interactive way.

## Workflow
### 1. Setup
Load libraries, set options, validate and prepare the directories, find and read raw data files, configure based on inputs
### 2. Create Seurat object
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
- Seurat was chosen because the gene expression data analyzed through this pipeline is single-cell RNA-seq, and it provides ways to normalize, scale, and visualize this data.
### 3. Normalize and preprocess the data
Normalize (so that data reflects true biological differences), find variable features, scale (to standardize the data), perform PCA (Principal Component Analysis to reduce dimensionality), cluster cells with similar profiles together
### 4. t-SNE
t-SNE allows us to visualize statistically significant genes based on these clusters. From these, researchers can determine potential gene ontologies arising from their sample(s).
### 5. Differential Gene Expression Analysis
Differential gene expression analysis takes the normalized gene read counts and allows researchers to determine quantitative changes in gene expression.
### 6. GSEA
GSEA, or Gene set enrichment analysis, helps determine the gene groups that are highly represented in the data.
### 7. Combine t-SNE and GSEA results
All the cluster results after running GSEA are saved, and the top pathways are saved as well.
### 8. Export and Display Results
All values from the previous steps and top clusters, pathways, etc are saved in a json file that is later visualized

## Overview of Functions
- `make_options()`: allows for user input through command line, allows to input data files from local machine
- `Read_MTX()`: reads the data from barcodes, features, and matrix files after patterns have been made and properly found from the input files given by the user
- `CreateSeuratObject()`: Seurat object created from data saved and user inputs on the name, cells, and features
- Cleaning the data and making it standardized so that it can be used for a tSNE and GSEA:
- `NormalizeData()`
- `ScaleData()`
- Reducing the dimensionality, clustering, and running the tSNE and saving it:
- `RunPCA()`
- `FindNeighbors()`
- `FindClusters()`
- `RunTSNE()`
- `DimPlot()`
- `ggsave()`
- `FindAllMarkers()`: performs the differential expression analysis
- `GetAssayData()`: saves the normalized gene expression data, which makes sure that the data is not due to technical biases
- tSNE coordinates, top 10 markers, top pathways, cluster results, cluster centroids, cluster average expression data, and more are saved and exported as a json file
Loading
Loading