TODO


Nov 11, 2024
# Better memory usage for parallel runs with SharedObject
	https://www.bioconductor.org/packages/release/bioc/html/SharedObject.html


add pbWeights to all dreamlet runs
# Nov 27,
	getNormExpr(): https://github.com/GabrielHoffman/dreamlet/issues/12


Nov 16
	combine assays as columns
	cbind(assay(pb, 1), assay(pb, 2),...)
	and rbind colData()
	change sample names by concating assay name

Nov 10:
	https://github.com/GabrielHoffman/dreamlet/issues/14#issuecomment-1805930963
	
Nov 7
	dreamletCompareClusters() for aggregateNonCountSignal()

Sept 12,
	Kiran: get_prediction() for dreamlet


August 22, 
	in processAssays(), voomWithDreamWeights(), fitVarPartModel(), also capture errors

See code: 

library(dreamlet)
library(muscat)
library(ExperimentHub)
library(zenith)
library(scater)
library(zellkonverter)
library(dplyr)
library(edgeR)
setwd("/sc/arion/projects/CommonMind/mpjanic/CHPLEX/")
library(crumblr)
library(ggtree)
library(aplot)

matrx = readH5AD("CHPLEX_10_10000_dreamlet_braak_prs_PC_ctrl.h5ad", raw=TRUE, use_hdf5=TRUE)

raw <- assay(altExp(matrx, "raw", withDimnames=TRUE, withColData=TRUE), "X")

sce <- SingleCellExperiment(list(counts=raw),
                                colData=colData(matrx),
                                rowData=rowData(matrx),
                                reducedDims=reducedDims(matrx))

pb <- aggregateToPseudoBulk(sce, assay ="counts", cluster_id = "clusters", sample_id="SubID_vS", verbose=FALSE)

# saveRDS(pb, file="pb.RDS")
pb = readRDS("pb.RDS")

CT = c('Epithelial','Mesenchymal','Macrophages','Fibroblasts','Endothelial')

# Including (1|set) here causes the model to fail
form = ~ (1|set) + scale(age) + AD_Bellenguez + (1|gender) + scale(PC1) + scale(PC2) + 1

res.proc = processAssays(pb[1:1000,], form, assays=CT[1])

res.dl = dreamlet( res.proc, form)


res.proc = processAssays(pb[1:1000,], ~1, assays=CT[1])

res = fitVarPart( res.proc, form)


# form = ~ 1
form = ~ scale(age) + AD_Bellenguez + (1|gender) + scale(PC1) + scale(PC2) + 1

res.proc = processAssays(pb, form, assays=CT[1])


details(res.proc)


tab = topTable(res.dl, coef="AD_Bellenguez", number=Inf)


res = tab %>%
    as_tibble %>%
    group_by(assay) %>%
    mutate(FDR.within = p.adjust(P.Value, "fdr"))

# counts results
table(res$adj.P.Val < 0.05)
table(res$FDR.within < 0.05)


July 14, 2023
	Allow metadata covariates for dreamletCompareClusters()
	Based on comment: https://github.com/GabrielHoffman/dreamlet/issues/11


June 14
Colin: 
I'm trying to understand how well the regulons/non-count variables are specific for microglia subtypes and their differential expression across subclasses/subtypes

Yes but for downstream functions like celltypespecificity or dreamletcompareclusters it requires an SCE object


April 25,
	F-tests with contestMD comapred to topTable()
	
	lme4 version dependency
	
	voomWithDreamWeights() with weights
	 	scale weights?
	 	fix weighting based on Ryan's comments?


# April 19
	variancePartition::dream doesnt scale weights as it should

	voomWithDreamWeights() with weights
	 	scale weights?
	 	fix weighting based on Ryan's comments?

	Fix Important
	https://github.com/Bioconductor/Contributions/issues/2955
	BiocStyle vignettes

	Confirm this works on real data
	Implement error checking using new vp 2.0.0
	Check parallel processing and memory usage

	refit is not used on dream model
		https://github.com/lme4/lme4/issues/678#issuecomment-1513839764

	F-tests with contestMD


	# Handle errors in dreamlet/dream

	# makeContrastsDream
	# Handle errors when terms are missing
	#  handle zeros different from non-zeros?
	Run makeContrastsDream early in dreamlet() to check terms in 
		contrasts match formula ever before samples are filtered


new lines in zenith_gsa

March 6, 
	Extend tree clutsering test to gene expression
	misc_vp/diffVar.R
	

Feb 28
	metadata(pb) duplicates continuous columns that are constant by sample

Feb 16:
	in pseudobulk, average per-cell statistics
	into per-cell type summary


Feb 2, 
	topTable(fit) does study-wide FDR correction
	topTable(fit[[a]]) does cluster-wide FDR correction


plotVolano scale="free" in facet wrap


for aggregateNonCountSignal() 
	Add filtering parameters as in processAssays():


	This filtering will address issue with weighting edge cases  


Jan 3, 2022
	[DONE] - in aggregateNonCountSignal() 
		include precision weights based on averaging scores.
		Need aggregateToPseudoBulk() to compute variance
	[DONE] - plotVoom() should give better error message here


# Nov 16
	- in dreamletCompareClusters() change coef from "compare"
		to something informative
	- allow user to combine multiple dreamletCompareClusters() fits
		for joint plotting and zenith

		Donghoon: uick question about working with fit object(s). When I run dreamletCompareClusters multiple times for each cluster, I end up multiple fit object with the same coef name compare . Is there a way to merge multiple fit object and run them on single zenith pipeline? I just want to summarize these by cluster names

	- Test of differential variance:
		use limma's EB on residual variance
		compare residual variance posterior from two data subsets
			misc_vp/compare_posteriors.R
			How is posterior parameterized in limma


# Nov 11,
	Add flag to switch between study-wide vs cell type multiple testing burden


# Nov 2, 2022
	Push variancePartition and zenith to Bioconductor
	Push remaCor to CRAN

	Retain mean of QC metrics in aggregateToPseudoBulk
		Use pmetadata in processAssays()


Better describe warnings:
	
	Warning in .check_arg_assay(x, assay) :
  Assay 'counts' stores continuous values instead of counts.
	Make sure this is the intended assay.
	This dataset contains assays: counts


June 13
	GSEABase::details() overtakes dreamlet::details() if GSEABase is loaded last


June 10, 2022
	select assays from plotVarPart


April 27: DONE
	aggregateToPseudoBulk fails with 1 subject id:

	see h5ad from 
	/sc/arion/projects/psychAD/NPS-AD/freeze2_rc/h5ad-by-donor/
	pb = aggregateToPseudoBulk(sce,
	  assay = "X",     
	  cluster_id = "constant",  
	  sample_id = "id",


March 10,
	Clean up dependencies, since RcppEigen is not used anymore

Feb 10
	Restore crumblr use in dreamlet.Rmd

Jan 06
	When dreamletCompareClusters() fails, return empty table.
		instead of stop()

	Aggregation on Seurat's sparseMatrix is now faster, but still slow.
		Can I adapt sparseMatrixStats::rowSums2

		https://github.com/const-ae/sparseMatrixStats/blob/ed73ffe97b39a7ed47963a3a771313e18e84f554/R/methods_row.R#L7

		https://github.com/zdebruine/RcppSparse	

Nov 21	
	DONE - Make sure that gene set enrichments make sense for mashr
	- vignette about importing Seurat data and casting to SCE with file-backing
	DONE - mashr example
	DONE - Dreamlet for one cell type vs list
	DONE - topTreat


Nov 9
	
df_vp = fitExtractVarPartModel(cobj, form, as.data.frame(colData(pbObj[,i])))


	processAssays: check that values are integer counts
	
	Add to vignette
		residuals
		cellTypeSpecificity
	Differential expression between cell types
		show empirically, that standard method gives false positives
			Need to account for Donor
	flexible filtering in processAssays()
	DONE: improved plotting for cellTypeSpecificity
	mashr
	

Nov 4
	describe makeContrastsDream:
	https://gabrielhoffman.github.io/variancePartition/reference/makeContrastsDream.html

Oct 15
	voomWithQualityWeights using weights

	I think to be fully correct you would need a version of arrayWeights that does mixed models
	I think you can pretty much copy the code for limma:::.arrayWeightsGeneByGene except that you’d replace the call to lm.wfit with the equivalent mixed model fit, and then you’d probably replace the loop with equivalent BiocParallel code to make it fast.


August 25:

	write function to extra cell type counts

	compositionCounts(sce)

	cellTypeCompositionTest() 
		extend to test multiple factors at the same time
		- use cell count weights for logit frac


	variancePartition and dream uses too many threads per fork
		RhpcBLASctl?

	When there is an NA in a covariate in dreamlet,
		dream warns each time.
	Instead have dreamlet warn once

	Show progress for processAssays like for dreamlet

	Add test of effect size heterogeneity across cell types

	cell type composition:
		feed into dream/voom/eBayes to borrow information 
		glmer can't

	examples about adding metadata to processAssays()	


	Describe treatment of NA's
	Describe behavior of voom plots

	DONE
	Donghoon's error:
		A categorical variable can only be included if there is variation within that 	category.  So with the NPS-AD I included Donor because there is variation within.  When there are no replicates, you can’t include Donor since there is just one sample per donor, and no variation within.  In your case, the error is due to dropped samples.  processAssays() only keeps samples with at least min.cells in a given cluster.  In the case your replicates are dropped, Donor becomes problematic and you get this error.
	I already check of const removeConstantTerms.  But check no variation


	docs:
		There are two separate questions that variancePartition/dreamlet address: 1) estimate fraction of variance explained, 2) hypothesis test on fixed effects while accounting for random effects.  So the dream() and dreamlet() functions only perform hypothesis tests on fixed effect variables.  But fitVarPart() addresses question (1) and if any categorical variables are modeled as random, they all must be random.  Yes, a little counter-intuitive, but fixed vs random depends on the question and analysis

		
	DONE: Feed contrasts to dreamlet()
	# dreamlet() should work with SingleCellExperiment also

	DONE: type definition for zenith_gsa for GeneSetCollection

	documentation for `prepSCE`: it maps each cell to 1) a cell cluster, 2) a biological sample and 3) a donor.


	** cell counts are used as weighs in voom.
	if not counts, they are used as weights directly
	is this equivalent is the formula is just and intercept?


	Make I(x^2) formula work

	include checkFormula in variancePartition


	DONE: remove isCounts

	DONE: store extract pmetadata with metadata
	
	DONE: pmetadata is given to processAssays(). Need to store it in result
	and use it in fitVarPart() and dreamlet()


August 19,
	do R CMD check to resolve issues with Generics
	plotVarPart() should take arguments ncolumns
	zenith_gsa should work on list of cell types of 1 cell type 


August 18: DONE
	- Allow class dreamletProcessedData to be subsetted to extract subset of assays
	- plotVoom should work on one or many assays using class names
	- Extend the following plots to single or multiple
		plotVoom()
		plotVarPart()
		plotVolcano()


May 14,
	Include number of expressed genes, or other QC as covariates


Authors@R: c(person("Gabriel", "Hoffman", role = c("aut", "cre"), email = "gabriel.hoffman@mssm.edu"))

April 2,
	handle contrasts L if missing
	should dreamletProcessedData be a SingleCellExperiment?
	graceful printing of each object
	show voom plots
	how should residual covariance be weighted?


March 30, 2021
	dreamlet: dream for single cell analysis
		weight by both reads and cell number
		like muscat, except per-sample weights are used by voomWithDreamWeights
			instead of lmFit: https://github.com/HelenaLC/muscat/blob/c93966305b1317b0caf5c6eb374826d6432b4a0a/R/utils-pbDS.R#L104
		Should I weight voom or the lmFit step?

# muscat is good, but random effects are especially important for multiplexed
# 10X datasets because batches are only 6 samples.
# Weighting by samples: the variance of binomial fraction is proportional to 1/n

# depends on limma, edgeR, variancePartition
dreamlet = function(snObj, form){

	# for each cell type
	fitList = lapply( assays(snObj), function(cellType){
		
		# get data for this cell type
		geneExpr = assay(snObj, cellType)

		# get samples with enough cells
		# filter genes
		geneExpr = filterByExpr(geneExpr)

		# per sample weights
		w = weights by cell types

		# need to adapt voomByDreamWeights to allow per sample weights
		vobj = voomByDreamWeights( geneExpr, form, info, weights=w)

		# need to adapt dream to allow per sample weights
		fit = dream(vobj, form, info, weights=w)

		# finish eBayes
		# only use trend if precion weights are not available
		fit = eBayes(fit, robust, trend)

		# return full results like muscat does
		fit
	})

	# Add ashr on the results of topTable

	fitList
}