Peak to Genes Score #539
-
Good evening! Thanks for a great conference today. Quick question. I've run some peak to gene linkages for a couple of my genes of interest on the vignette reference data. I'm trying to interpret the score associated with the linkages. I read the methods on your paper and it describes statistical testing for retaining linkages. How does the score that appears on the plot correlate to the testing to decide if a linkage is "real"? Is it safe to assume that LinkPeaks only gives "real" linkages and the score is relative? For one gene there are multiple linkages and it gives values ranging from 0.07 to 0.10. I'm not surprised given that gene expression was pretty low since it is a disease gene in a healthy sample. For the second target there was 1 linkage and it gave a score of 0.087. The linkage goes to a peak pretty far from the gene and actually closer to the "next" gene on the strand. I'd appreciate some wisdom. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The score that's shown in the link plot is the Pearson correlation coefficient between the peak accessibility and the gene expression. This gives a sense of the strength of the relationship between accessibility and expression, but does not reflect the significance of the association. The z-score (and associated p-value) for each link is determined by sampling a background set of (by default 200) peaks on a different chromosome to the gene (matched for overall accessibility, GC content, and length), and computing the Pearson correlation between each of these background peaks accessibility and the gene expression. This empirical distribution of expected correlation coefficients is then used to compute the z-score for the peak-gene link.
By default we only return links with a p-value <0.05, but this can't guarantee that the link is "real", it's still possible that the association between accessibility and expression occurred by change, or that it could be an indirect association. The score is not relative, each peak is tested independently. The method of linking peaks to genes we implemented in Signac is based on the methods developed in the SHARE-seq paper, so I'd suggest reading that paper for more information. |
Beta Was this translation helpful? Give feedback.
Hi @coopershawna
The score that's shown in the link plot is the Pearson correlation coefficient between the peak accessibility and the gene expression. This gives a sense of the strength of the relationship between accessibility and expression, but does not reflect the significance of the association. The z-score (and associated p-value) for each link is determined by sampling a background set of (by default 200) peaks on a different chromosome to the gene (matched for overall accessibility, GC content, and length), and computing the Pearson correlation between each of these background…