Estimate the complexity of haplotype networks
A common way of illustrating phylogeographic results is through the use of haplotype networks. While these networks help to visualize relationships between individuals, populations, and species, evolutionary studies often only quantitatively analyze genetic diversity among haplotypes and ignore other network properties.
HapNetComplexity.R estimates complexity of haplotype networks (HBd) by combining haplotype (Hd) and topological (Bd) diversity
Citation
Use this script to build haplotype networks and calculate:
- Nucleotide diversity (Pi) = a measurement of the genetic distance between sequences
- Haplotype diversity (Hd) = a measurement of the genetic diversity in a population
- Branch diversity (Bd) = a new measurement of the topological diversity in a haplotype network or the diversity of interrelationships among the observed haplotypes in a population
- Haplotype Network Branch diversity (HBd) = a new measurement of the complexity in a haplotype network
Additional calculations:
- Number of individuals (n)
- Number of haplotypes (nH)
- Number of haplotype classes (nHc)
- Frequency of haplotype classes (niHc)
See paper for definitions
Usage
-
Place this script in the same directory as:
- A fasta file with your sequences
- A file with site info (optional).
- The site file consists of a .csv file with the sample names in the first column and site names in the second.
- Call these columns "sample" and "site", respectively. Save the file as "sites.csv"
- All metrics can still be calculated without any site information (see script)
-
Open the script to set your working directory, load alignments and specify or create a sites file.
-
Follow further directions within the
HapNetComplexity.R
script
Example model datasets of increasing complexity:
Sites file for all datasets
S13_File_sites_for_all_Fig1.csv
All example datasets contain 21 individuals and range from 6 to 21 haplotypes. See our paper for more model and empirical examples