Code for quantifying spots and stripes using topological data analysis and machine learning
Authors: Melissa R. McGuirl, Alexandria Volkening, Bjorn Sandstede
For questions/comments please contact Melissa R. McGuirl at melissa_mcguirl@brown.edu.
This software computes pattern statistics using topological data analysis and machine learning techniques. The input of this software is a collection of coordinate data from pigment cells.
This software is based upon the work presented in M.R. McGuirl, A. Volkening, and B. Sandstede, "Topological data analysis of zebrafish patterns," PNAS 2020, 117 (10) 5113-5124.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Ripser.py v0.3.2 (https://ripser.scikit-tda.org/index.html),
- Python
- Matlab
- ripser
- matplotlib
- numpy
pip install Cython
pip install Ripser
cd
git clone https://github.com/sandstede-lab/Quantifying_Zebrafish_Patterns
cd Quantifying_Zebrafish_Patterns
pip install -r requirements.txt
A few MAT files from simulations of the agent-based model of A.V. and B.S. under the default parameter regime are provided as sample data in the data/sample_inputs/ folder. Complete datasets from full model simulations are freely available on figshare at www.figshare.com/projects/Zebrafish_simulation_data/72689.
cd data/sample_inputs
- Out_WT_default_1.mat
- Out_pfef_default_1.mat
- Out_shady_default_1.mat
- Out_nacre_default_1.mat
Example scripts are provided to demonstrate how to generate input data and run the program from wild-type stripes and mutants.
cd src/matlab/examples
- test_WT.m (quantify wild-type stripes)
- test_pfeffer.m (quantify pfeffer spots)
- test_shady.m (quantify shady spots)
- test_nacre.m (quantify nacre spots)
The two main files are quantify_spots.m and quantify_stripes.m for quantifying spots and stripes, respectively.
The following steps are illustrated in the test examples in src/matlab/examples. After running test_WT.m, test_pfeffer.m, test_shady.m, or test_nacre.m, all of these steps will be complete. The distance matrices are saved in data/sample_dist_mats and the persistence diagrams should be saved in data/sample_barcodes/ after running Ripser in Python.
- Load in cell-coordinate data to MATLAB (e.g. load data/sample_inputs/Out_WT_default_1.mat)
- Extract cell-coordinate data at time point of interest
- Save cell-coordinates as cells_mel, cells_iriL, cells_xanD, cells_xanL
- Generate distance matrices of cell-cell pairwise distances and save as text file
- Run Ripser using get_barcodes.py in src/python to get persistent homology data into PD_dir
- Compute boundaryX and boundaryY, the right and top boundaries of the input domain (assume domain starts at origin). For the examples provided, these are all saved in the input .mat files as boundaryX(time_pt) and boundaryY(time_pt).
- Specify persistence cut-off for betti number computation
- For spots, specify cell-type used for quantifying spots
- For stripes, get time series (1) of X^d cell locations (cellsXd_all), (2) number of X^d cells (numXand_all), and y-boundaries (boundayY_all) for identifying when new interstripes form.
- To quantify stripes: quantify_stripes(cells_mel, cells_iriL, cells_xanD, cells_xanL, mel1_dir, xanC1_dir, xanS1_dir, boundaryX, boundaryY, cellsXd_all, numXand_all, boundaryY_all)
- To quantify spots: quantify_spots(cells_mel, cells_iriL, cells_xanD, cells_xanL, PD_dir, boundaryX, boundaryY, pers_cutoff, cell_type)
For complete documentaion on this code, see Documentation_for_Quantifying_Zebrafish_Patterns_Code.pdf.
If the straightness measure is negative, divide by (max(top_bd_x)-min(top_bd_x)) instead of max(x_querys) in the calculation of top_cv and bottom_cv in straightness_measure.m. Negative straightness measures indicate erroneuous boundary detection, which will be improved in a future release of the software.