-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathTODO
476 lines (276 loc) · 12.2 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
Nov 11, 2024
# Better memory usage for parallel runs with SharedObject
https://www.bioconductor.org/packages/release/bioc/html/SharedObject.html
add pbWeights to all dreamlet runs
# Nov 27,
getNormExpr(): https://github.com/GabrielHoffman/dreamlet/issues/12
Nov 16
combine assays as columns
cbind(assay(pb, 1), assay(pb, 2),...)
and rbind colData()
change sample names by concating assay name
Nov 10:
https://github.com/GabrielHoffman/dreamlet/issues/14#issuecomment-1805930963
Nov 7
dreamletCompareClusters() for aggregateNonCountSignal()
Sept 12,
Kiran: get_prediction() for dreamlet
August 22,
in processAssays(), voomWithDreamWeights(), fitVarPartModel(), also capture errors
See code:
library(dreamlet)
library(muscat)
library(ExperimentHub)
library(zenith)
library(scater)
library(zellkonverter)
library(dplyr)
library(edgeR)
setwd("/sc/arion/projects/CommonMind/mpjanic/CHPLEX/")
library(crumblr)
library(ggtree)
library(aplot)
matrx = readH5AD("CHPLEX_10_10000_dreamlet_braak_prs_PC_ctrl.h5ad", raw=TRUE, use_hdf5=TRUE)
raw <- assay(altExp(matrx, "raw", withDimnames=TRUE, withColData=TRUE), "X")
sce <- SingleCellExperiment(list(counts=raw),
colData=colData(matrx),
rowData=rowData(matrx),
reducedDims=reducedDims(matrx))
pb <- aggregateToPseudoBulk(sce, assay ="counts", cluster_id = "clusters", sample_id="SubID_vS", verbose=FALSE)
# saveRDS(pb, file="pb.RDS")
pb = readRDS("pb.RDS")
CT = c('Epithelial','Mesenchymal','Macrophages','Fibroblasts','Endothelial')
# Including (1|set) here causes the model to fail
form = ~ (1|set) + scale(age) + AD_Bellenguez + (1|gender) + scale(PC1) + scale(PC2) + 1
res.proc = processAssays(pb[1:1000,], form, assays=CT[1])
res.dl = dreamlet( res.proc, form)
res.proc = processAssays(pb[1:1000,], ~1, assays=CT[1])
res = fitVarPart( res.proc, form)
# form = ~ 1
form = ~ scale(age) + AD_Bellenguez + (1|gender) + scale(PC1) + scale(PC2) + 1
res.proc = processAssays(pb, form, assays=CT[1])
details(res.proc)
tab = topTable(res.dl, coef="AD_Bellenguez", number=Inf)
res = tab %>%
as_tibble %>%
group_by(assay) %>%
mutate(FDR.within = p.adjust(P.Value, "fdr"))
# counts results
table(res$adj.P.Val < 0.05)
table(res$FDR.within < 0.05)
July 14, 2023
Allow metadata covariates for dreamletCompareClusters()
Based on comment: https://github.com/GabrielHoffman/dreamlet/issues/11
June 14
Colin:
I'm trying to understand how well the regulons/non-count variables are specific for microglia subtypes and their differential expression across subclasses/subtypes
Yes but for downstream functions like celltypespecificity or dreamletcompareclusters it requires an SCE object
April 25,
F-tests with contestMD comapred to topTable()
lme4 version dependency
voomWithDreamWeights() with weights
scale weights?
fix weighting based on Ryan's comments?
# April 19
variancePartition::dream doesnt scale weights as it should
voomWithDreamWeights() with weights
scale weights?
fix weighting based on Ryan's comments?
Fix Important
https://github.com/Bioconductor/Contributions/issues/2955
BiocStyle vignettes
Confirm this works on real data
Implement error checking using new vp 2.0.0
Check parallel processing and memory usage
refit is not used on dream model
https://github.com/lme4/lme4/issues/678#issuecomment-1513839764
F-tests with contestMD
# Handle errors in dreamlet/dream
# makeContrastsDream
# Handle errors when terms are missing
# handle zeros different from non-zeros?
Run makeContrastsDream early in dreamlet() to check terms in
contrasts match formula ever before samples are filtered
new lines in zenith_gsa
March 6,
Extend tree clutsering test to gene expression
misc_vp/diffVar.R
Feb 28
metadata(pb) duplicates continuous columns that are constant by sample
Feb 16:
in pseudobulk, average per-cell statistics
into per-cell type summary
Feb 2,
topTable(fit) does study-wide FDR correction
topTable(fit[[a]]) does cluster-wide FDR correction
plotVolano scale="free" in facet wrap
for aggregateNonCountSignal()
Add filtering parameters as in processAssays():
This filtering will address issue with weighting edge cases
Jan 3, 2022
[DONE] - in aggregateNonCountSignal()
include precision weights based on averaging scores.
Need aggregateToPseudoBulk() to compute variance
[DONE] - plotVoom() should give better error message here
# Nov 16
- in dreamletCompareClusters() change coef from "compare"
to something informative
- allow user to combine multiple dreamletCompareClusters() fits
for joint plotting and zenith
Donghoon: uick question about working with fit object(s). When I run dreamletCompareClusters multiple times for each cluster, I end up multiple fit object with the same coef name compare . Is there a way to merge multiple fit object and run them on single zenith pipeline? I just want to summarize these by cluster names
- Test of differential variance:
use limma's EB on residual variance
compare residual variance posterior from two data subsets
misc_vp/compare_posteriors.R
How is posterior parameterized in limma
# Nov 11,
Add flag to switch between study-wide vs cell type multiple testing burden
# Nov 2, 2022
Push variancePartition and zenith to Bioconductor
Push remaCor to CRAN
Retain mean of QC metrics in aggregateToPseudoBulk
Use pmetadata in processAssays()
Better describe warnings:
Warning in .check_arg_assay(x, assay) :
Assay 'counts' stores continuous values instead of counts.
Make sure this is the intended assay.
This dataset contains assays: counts
June 13
GSEABase::details() overtakes dreamlet::details() if GSEABase is loaded last
June 10, 2022
select assays from plotVarPart
April 27: DONE
aggregateToPseudoBulk fails with 1 subject id:
see h5ad from
/sc/arion/projects/psychAD/NPS-AD/freeze2_rc/h5ad-by-donor/
pb = aggregateToPseudoBulk(sce,
assay = "X",
cluster_id = "constant",
sample_id = "id",
March 10,
Clean up dependencies, since RcppEigen is not used anymore
Feb 10
Restore crumblr use in dreamlet.Rmd
Jan 06
When dreamletCompareClusters() fails, return empty table.
instead of stop()
Aggregation on Seurat's sparseMatrix is now faster, but still slow.
Can I adapt sparseMatrixStats::rowSums2
https://github.com/const-ae/sparseMatrixStats/blob/ed73ffe97b39a7ed47963a3a771313e18e84f554/R/methods_row.R#L7
https://github.com/zdebruine/RcppSparse
Nov 21
DONE - Make sure that gene set enrichments make sense for mashr
- vignette about importing Seurat data and casting to SCE with file-backing
DONE - mashr example
DONE - Dreamlet for one cell type vs list
DONE - topTreat
Nov 9
df_vp = fitExtractVarPartModel(cobj, form, as.data.frame(colData(pbObj[,i])))
processAssays: check that values are integer counts
Add to vignette
residuals
cellTypeSpecificity
Differential expression between cell types
show empirically, that standard method gives false positives
Need to account for Donor
flexible filtering in processAssays()
DONE: improved plotting for cellTypeSpecificity
mashr
Nov 4
describe makeContrastsDream:
https://gabrielhoffman.github.io/variancePartition/reference/makeContrastsDream.html
Oct 15
voomWithQualityWeights using weights
I think to be fully correct you would need a version of arrayWeights that does mixed models
I think you can pretty much copy the code for limma:::.arrayWeightsGeneByGene except that you’d replace the call to lm.wfit with the equivalent mixed model fit, and then you’d probably replace the loop with equivalent BiocParallel code to make it fast.
August 25:
write function to extra cell type counts
compositionCounts(sce)
cellTypeCompositionTest()
extend to test multiple factors at the same time
- use cell count weights for logit frac
variancePartition and dream uses too many threads per fork
RhpcBLASctl?
When there is an NA in a covariate in dreamlet,
dream warns each time.
Instead have dreamlet warn once
Show progress for processAssays like for dreamlet
Add test of effect size heterogeneity across cell types
cell type composition:
feed into dream/voom/eBayes to borrow information
glmer can't
examples about adding metadata to processAssays()
Describe treatment of NA's
Describe behavior of voom plots
DONE
Donghoon's error:
A categorical variable can only be included if there is variation within that category. So with the NPS-AD I included Donor because there is variation within. When there are no replicates, you can’t include Donor since there is just one sample per donor, and no variation within. In your case, the error is due to dropped samples. processAssays() only keeps samples with at least min.cells in a given cluster. In the case your replicates are dropped, Donor becomes problematic and you get this error.
I already check of const removeConstantTerms. But check no variation
docs:
There are two separate questions that variancePartition/dreamlet address: 1) estimate fraction of variance explained, 2) hypothesis test on fixed effects while accounting for random effects. So the dream() and dreamlet() functions only perform hypothesis tests on fixed effect variables. But fitVarPart() addresses question (1) and if any categorical variables are modeled as random, they all must be random. Yes, a little counter-intuitive, but fixed vs random depends on the question and analysis
DONE: Feed contrasts to dreamlet()
# dreamlet() should work with SingleCellExperiment also
DONE: type definition for zenith_gsa for GeneSetCollection
documentation for `prepSCE`: it maps each cell to 1) a cell cluster, 2) a biological sample and 3) a donor.
** cell counts are used as weighs in voom.
if not counts, they are used as weights directly
is this equivalent is the formula is just and intercept?
Make I(x^2) formula work
include checkFormula in variancePartition
DONE: remove isCounts
DONE: store extract pmetadata with metadata
DONE: pmetadata is given to processAssays(). Need to store it in result
and use it in fitVarPart() and dreamlet()
August 19,
do R CMD check to resolve issues with Generics
plotVarPart() should take arguments ncolumns
zenith_gsa should work on list of cell types of 1 cell type
August 18: DONE
- Allow class dreamletProcessedData to be subsetted to extract subset of assays
- plotVoom should work on one or many assays using class names
- Extend the following plots to single or multiple
plotVoom()
plotVarPart()
plotVolcano()
May 14,
Include number of expressed genes, or other QC as covariates
Authors@R: c(person("Gabriel", "Hoffman", role = c("aut", "cre"), email = "gabriel.hoffman@mssm.edu"))
April 2,
handle contrasts L if missing
should dreamletProcessedData be a SingleCellExperiment?
graceful printing of each object
show voom plots
how should residual covariance be weighted?
March 30, 2021
dreamlet: dream for single cell analysis
weight by both reads and cell number
like muscat, except per-sample weights are used by voomWithDreamWeights
instead of lmFit: https://github.com/HelenaLC/muscat/blob/c93966305b1317b0caf5c6eb374826d6432b4a0a/R/utils-pbDS.R#L104
Should I weight voom or the lmFit step?
# muscat is good, but random effects are especially important for multiplexed
# 10X datasets because batches are only 6 samples.
# Weighting by samples: the variance of binomial fraction is proportional to 1/n
# depends on limma, edgeR, variancePartition
dreamlet = function(snObj, form){
# for each cell type
fitList = lapply( assays(snObj), function(cellType){
# get data for this cell type
geneExpr = assay(snObj, cellType)
# get samples with enough cells
# filter genes
geneExpr = filterByExpr(geneExpr)
# per sample weights
w = weights by cell types
# need to adapt voomByDreamWeights to allow per sample weights
vobj = voomByDreamWeights( geneExpr, form, info, weights=w)
# need to adapt dream to allow per sample weights
fit = dream(vobj, form, info, weights=w)
# finish eBayes
# only use trend if precion weights are not available
fit = eBayes(fit, robust, trend)
# return full results like muscat does
fit
})
# Add ashr on the results of topTable
fitList
}