-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathFine_Mapping.Enrichment.Rmd
268 lines (210 loc) · 11.8 KB
/
Fine_Mapping.Enrichment.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
---
title: "<center><h1>Fine Mapping:</h1>Enrichment</h1></center>"
author:
"<div class='container'>
<h3>Brian M. Schilder, Bioinformatician II<br>
Raj Lab<br>
Department of Neuroscience<br>
Icahn School of Medicine at Mount Sinai<br>
NYC, New York<br>
</h3>
<a href='https://github.com/RajLabMSSM/Fine_Mapping' target='_blank'><img src='./echolocatoR/images/echo_logo_sm.png'></a>
<a href='https://github.com/RajLabMSSM' target='_blank'><img src='./web/images/github.png'></a>
<a class='item' href='https://rajlabmssm.github.io/RajLab_website/' target='_blank'>
<img src='./web/images/brain-icon.png'>
<span class='caption'>RAJ LAB</span>
<a href='https://icahn.mssm.edu/' target='_blank'><img src='./web/images/sinai.png'></a>
</div>"
date: "<br>Most Recent Update:<br> `r Sys.Date()`"
output:
rmarkdown::html_document:
theme: cerulean
highlight: zenburn
code_folding: show
toc: true
toc_float: true
smooth_scroll: true
number_sections: false
self_contained: true
css: ./web/css/style.css
editor_options:
chunk_output_type: inline
---
```{r setup, message=F, warning=F, dpi = 600, class.output = "pre"}
source("echolocatoR/R/MAIN.R")
library(dplyr)
library(plotly)
```
# Fine-mapping Summaries
- First, we gather annotations from HaploReg and/or Biomart and assess the distribtuion of differenrt SNP Groups (e.g. GWAS lead SNPs, Credible Sets, Consensus SNPs).
```{r Merge Results}
FM <- merge_finemapping_results(minimum_support = 0, dataset = "./Data/GWAS/Nalls23andMe_2019")
(FM %>% dplyr::group_by(Gene) %>% tally())$n %>% mean()
(subset(FM, Support>0) %>% dplyr::group_by(Gene) %>% tally())$n %>% mean()
((FM %>% dplyr::group_by(Gene) %>% tally())$n >0 ) %>% sum()
((FM %>% dplyr::group_by(Gene) %>% dplyr::summarise(n.consensus=sum(Consensus_SNP)))$n.consensus > 0) %>% sum() / length(unique(FM$Gene))
merged_results <- merge_finemapping_results(minimum_support=0,
include_leadSNPs=T,
xlsx_path="./Data/annotated_finemapping_results.xlsx",
from_storage=T,
haploreg_annotation=T,
biomart_annotation=T,
verbose = F,
dataset = "./Data/GWAS/Nalls23andMe_2019")
createDT(subset(merged_results, Consensus_SNP==T))
```
## Mutation Types
```{r Mutation Types}
potential_missense <- SNPs_by_mutation_type(merged_results, mutation_type="missense_variant")
createDT(potential_missense)
# candidate_counts <- counts_summary(top_SNPs, merged_results)
# createDT(candidate_counts)
# Distribution of mutation types in Credible Set SNPs
mod <- merged_results %>%
mutate(CredSet.Lead =
ifelse(Support>0,
ifelse(leadSNP==T,"CS & leadSNP", "CS & non-leadSNP"),
ifelse(leadSNP==T, "Non-CS & leadSNP","Non-CS & non-leadSNP")),NA)
ggplot(data=mod, aes(x=consequence_type_tv, fill=CredSet.Lead)) +
geom_bar(aes(y = ..count..), position="dodge") + # (..count..)/sum(..count..)
coord_flip() +
labs(title="Proportions of Mutation Types (Biomart)",
subtitle = "Support = Within Credible Set",
x = "Mutation Type",
y="SNP Count") +
theme(legend.text = element_text(size=8))
```
# psychENCODE
- Results from the psychENCODE project have been [published](https://science.sciencemag.org/content/362/6420/eaat8464) and the [data made available](http://resource.psychencode.org/?_ga=2.202944787.381602893.1563834178-1901423550.1563307711).
- We perform an inital assesment of what percentage of each SNP Group contains expression, isoform, or chromatin QTLs (e/iso/cQTL).
```{r psychENCODE, class.output = "pre"}
psychENCODE.download_summary_stats()
percent_df <- psychENCODE.assay_summary()
psychENCODE.overlap_plots(percent_df)
```
# fGWAS
- **NOTE**: There are several tools called fGWAS/FGWAS. I am using the command line tool by Joe Pickrell found [here](https://github.com/joepickrell/fgwas). Reference:
+ [Pickrell, Joseph K. “Joint Analysis of Functional Genomic Data and Genome-Wide Association Studies of 18 Human Traits.” American Journal of Human Genetics 94, no. 4 (2014): 559–73. https://doi.org/10.1016/j.ajhg.2014.03.004.](https://www.sciencedirect.com/science/article/pii/S0002929714001062)
- fGWAS is designed to take all signficant SNPs from a GWAS and test for enrichment of various functional (e.g. DNAase hit, chromatin peak), genic (e.g. synonymous, noncoding) and/or distance (e.g.) annotations.
- Here we use fGWAS to test for enrichment amongst three different groups of SNPs:
1. **lead SNPs**: The top SNP (sorted by GWAS FDR) from each locus in the original GWAS.
2. **Credible Set SNPs**: SNPs that were identified by at least one fine-mapping tool.
3. **Consensus SNPs**: SNPs that were identified by ALL fine-mapping tools tested.
- We use enrichment as a way to tell us whether our fine-mapping is working, i.e. it's identifying SNPs that are more functionally releveant and thus more likely to play a role in the phenotype (e.g. Parksinon's).
+ Specifically, we are using the 451 pre-formatted annotations found [here](https://github.com/joepickrell/fgwas)
## Run fGWAS
- Here we use two different control SNP.Groups:
1. **Selected**: In this case, our selected SNPs are the lead SNPs from the Kunkle et al. (2019).
2. **Random**: A random set of SNPs equal to the number of Consensus SNPs.
**Notes on Flags**:
- **-onlyp**: fGWAS often returns ``WARNING: failed to converge``. The author explains why [here](https://github.com/joepickrell/fgwas/issues/4):
+ *If there are only a small number of GWAS hits in an annotation, fgwas sometimes fails to converge, as it "tries" to estimate the MLE of the enrichment parameter as -Inf. A simple fix is to only use the penalized likelihood (-onlyp -print).*
- **-cc**: For any case-control study, the ``-cc`` flag is supposed to be used. This seems to slow down the analysis a lot but yields almost identicaly results.
- **-fine**: Conducts fine-mapping by assigned each locus a ``SEGNUMBER``.
- **-print**: Produces several extra output files, including SNP-level posterior probabilities.
```{r Run fGWAS, class.output = "pre"}
# Selected SNPS
AD_loci <- readxl::read_excel("./Data/GWAS/Kunkle_2019/Kunkle2019_supplementary_tables.xlsx",
sheet = "Supplementary Table 8")
Kunkle_SNPs <- unique(AD_loci$`Top Associated SNV`)
# Run fGWAS
results.DF <- fGWAS(results_path = "./Data/GWAS/Nalls23andMe_2019/_genome_wide",
dataset = "./Data/GWAS/Nalls23andMe_2019",
SNP.Groups = c("Consensus", "CredibleSet", "GWAS.lead", "Selected", "Random"),
selected_SNPs = Kunkle_SNPs,
remove_tmps = T,
force_new_fgwas = F,
force_new_annot = F)
# results.DF.random <- fGWAS(results_path = "./Data/GWAS/Nalls23andMe_2019/_genome_wide",
# dataset = "./Data/GWAS/Nalls23andMe_2019",
# SNP.Groups = c("Random"),
# remove_tmps = T,
# force_new_fgwas = T,
# force_new_annot = F,
# random.iterations = 100)
createDT(results.DF)
```
## Plot fGWAS
### Boxplots
- Each of the enrichment test results (451 annotations x 3 SNP Groups) were grouped according to their *estimate* (i.e. their enrichment score). No tests yielded and estimate of 0:
+ **Enriched**: ``estimate > 0``
+ **Depleted**: ``estimate < 0``
- When ``interact=T``, hover to explore more information about each data point, including the annotation that the enirchment test was conducted with.
```{r fGWAS - Boxplot}
bp <- fGWAS.boxplot(results.DF, interact=T, show_plot = F)
htmltools::tagList(list(bp))
```
### CI Plot
```{r fGWAS - Line Plots, fig.height=14, fig.width=10}
cip <- fGWAS.line_plot(results.DF)
```
### Heatmaps
- Estimates are averaged by tissue here.
- You can also view each annotation as a separate row by selecting ``annotation_or_tissue = "annotation``.
+ In this view, additional metadata variables can be explored on the right side columns.
```{r fGWAS - Heatmaps, fig.height=11, fig.width=5}
# By Tissue
hm <- fgwas.heatmap(results.DF, annotation_or_tissue = "tissue")
print(hm)
# By Annotation
hm <- fgwas.heatmap(results.DF, annotation_or_tissue = "annotation")
print(hm)
```
# GoShifter
__GoShifter__: Performs SNP-level enrichment tests using BED files. However, unlike simpler methods, it also incorporates Linkage Disequilibrium (LD) info into the model. In this pipeline, LD is estimated from the refrence panel (e.g. 1000 Genomes Phase 1).
+ Because GoShifter does not consider a specific background, these tests were repeated for both the Credible Sets, and then in all the SNPs in the same locus.
- GoShifter was used to test for enrichment in three different SNP Groups:
1. **Credible Set**: SNPs that were prioritized by at least one fine-mapping tool.
2. **Lead SNPs**: The top N significant SNPs in the original GWAS, where N is equal to the number of Credible Set SNPs.
3. **Random**: A randomly chosen numer of SNPs
- Plots:
+ **Histograms**: For each enrichment test, GoShifter produces a p-value. However in the documentation they state that p-value is calculated accordingly: *"P-value is the number of times the “enrichment” is greater or equal to the observed overlap divided by total number of permutations."*
+ This can be written as:``sum(enrichment >= nSnpOverlap) / n.permutations``
+ However, the given p-value (GS.pval) and the calculated p.value (pval) differ quite a lot, and I'm not sure why this is.
## Prepare Data
```{r GoShifter - Chromatin States}
# List all chromatin state options
chrom.states.df <- GoShifter.list_chromatin_states()
createDT(chrom.states.df)
```
```{r GoShifter - SNP Groups}
# Prepare Data
results_path <- "./Data/GWAS/Nalls23andMe_2019/LRRK2"
finemap_DT <- data.table::fread(file.path(results_path,"Multi-finemap/Multi-finemap_results.txt"))
# Create different subsets of SNPs to test for enrichment
snp_df.Consensus <- subset(finemap_DT, Consensus_SNP==T)
snp_df.CS <- subset(finemap_DT, Support>0)
snp_df.GWAS <- (finemap_DT %>% dplyr::arrange(P, desc(Effect)))[1:nrow(snp_df.CS),]
snp_df.Random <- finemap_DT[sample(nrow(finemap_DT), nrow(snp_df.CS)), ]
# Add dfs to list
snp.df.list <- list("Consensus"=snp_df.Consensus,
"CredibleSet"=snp_df.CS,
"GWAS"=snp_df.GWAS)#, "Random"=snp_df.Random)
```
```{r GoShifter - Run}
# Iterate GoShifter over each SNP Group
GS.groups <- lapply(names(snp.df.list), function(group){
printer("GoShifter:: testing enrichment for SNP Group:",group)
GS.RESULTS <- GoShifter(results_path = results_path,
snp_df = snp.df.list[[group]],
SNP.Group = group,
ROADMAP_search = "monocyte",#c("monocyte","brain"),
ROADMAP_type = c("PrimaryCell","PrimaryTissue"),
chromatin_states = c("TssA","Quies"),#NA,
R2_filter = 0.8, # Include LD SNPs at rsquared >= NUM [default: 0.8]
overlap_threshold = 1,
remove_tmps = F,
force_new_goshifter = F) # Pull from exisiting files
GS.RESULTS$SNP.Group <- group
return(GS.RESULTS)
}) %>% data.table::rbindlist(fill=T)
createDT(GS.groups)
```
```{r GoShifter - Histograms}
# GoShifter.histograms(GS_results_CS)
hst <- GoShifter.histograms_SNPgroups(GS.groups)
```
```{r GoShifter - Heatmap, results="asis"}
hm <- GoShifter.heatmap(GS.groups)
htmltools::tagList(list(hm))
```