forked from MonashBioinformaticsPlatform/r-linear
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathindex.Rmd
123 lines (76 loc) · 5.91 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
title: "Linear models in R"
output:
html_document:
theme: cerulean
css: style.css
---
## Workshop notes
* [Slideshow](slides/linear_thinking.html)
* [Workshop](topics/linear_models.html)
## Setup
This workshop is designed to work with RStudio running in [Posit Cloud](https://posit.cloud/). Go to https://posit.cloud/ and create a new project. Monash users can log in with their Monash Google account. The workshop can also be done using R locally on your laptop (if doing this, we also recommend you create a new project to contain the files).
Running the R code below will download files and install packages used in this workshop.
```{r eval=FALSE}
# Download data
download.file(
"https://monashdatafluency.github.io/r-linear/r-linear-files.zip",
destfile="r-linear-files.zip")
unzip("r-linear-files.zip")
# Install some CRAN packages:
install.packages(c(
"tidyverse", "multcomp", "emmeans",
"lme4", "lmerTest", "pbkrtest", "BiocManager"))
# Install some Bioconductor packages:
BiocManager::install(c("limma","edgeR","topconfects"))
```
Now load the file `linear_models.R` in the `r-linear-files` folder.
## Files
* [r-linear-files.zip](r-linear-files.zip) - Files used in this workshop.
## Key functions to remember
Built-in to R:
lm, model.matrix, coef, sigma, df.residual,
predict, confint, summary, anova, drop1,
I, poly
`splines` -- curve fitting:
ns, bs
`multcomp` and `emmeans` -- linear hypothesis tests and multiple comparisons:
glht, mcp, confint, summary, emmeans
`limma` and `edgeR` -- fitting many models to gene expression data:
DGEList, calcNormFactors, cpm,
lmFit, contrasts.fit, eBayes, plotSA, topTable
## Links
* Postgraduate students at Monash can access [statistical consulting](https://www.monash.edu/researchinfrastructure/datascienceandai/capabilities/statistical-consulting), courtesy of the Data Science and AI Platform. This is a good service for beginner to intermediate statistical questions.
* The [Biostatistics Consulting Platform](https://www.monash.edu/medicine/sphpm/units/biostats-consulting) in the Monash Faculty of Medicine may be more suitable for advanced questions about experimental design and analysis.
<br>
* [Monash Data Fluency](https://www.monash.edu/data-fluency)
* [Monash Bioinformatics Platform](https://www.monash.edu/researchinfrastructure/bioinformatics)
* [More workshop material from Monash Bioinformatics Platform](https://www.monash.edu/researchinfrastructure/bioinformatics/training)
<br>
* [Course notes for PH525x.](http://genomicsclass.github.io/book/) Initial chapters of this edX course cover similar material to this workshop.
* [StatQuest videos on linear models.](https://www.youtube.com/watch?v=PaFPbb66DxQ&list=PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU) A friendly but thorough introduction to key ideas.
* [Faraway (2014) "Linear models with R"](https://julianfaraway.github.io/faraway/LMR/)
* [Harrel (2015) "Regression Modeling Strategies"](https://hbiostat.org/rms/) has detailed practical advice for creating predictive models, such as models using biomarkers. [Frank Harrell's home page.](https://hbiostat.org/)
* [James, Witten, Hastie and Tibshirani (2013) "An Introduction to Statistical Learning"](https://www.statlearning.com/) describes fundamental ideas and methods in machine learning.
* Richard McElreath has a course called "Statistical Rethinking" with a Baysian approach and a focus on causal concepts which are important if you have observational rather than experimental data. The [2023 video lectures](https://www.youtube.com/watch?v=FdnMWdICdRs&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus&index=1) are a good place to start, and there is also a book.
* [Dance of the CIs app](http://logarithmic.net/2017/dance/) for intuition about Confidence Intervals.
* [The Art of Linear Algebra](https://github.com/kenjihiranabe/The-Art-of-Linear-Algebra/blob/main/The-Art-of-Linear-Algebra.pdf) for intuition about matrices and vectors -- sections 1-3 are relevant to this workshop.
* Testing for differential gene expression often uses linear models. The developers of `limma` and `edgeR` at [WEHI](https://www.wehi.edu.au/research/research-fields/bioinformatics) have written some good introductions to this topic:
* ["RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR"](https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html)
* ["A guide to creating design matrices for gene expression experiments"](https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html) (design matrix = model matrix)
* For samples of variable sequencing depth or quality, limma's `voom` or `voomWithQualityWeights` can be used to account for heteroscedasticity. If using a model other than `~ 0 + group`, read the note in the documentation for `contrasts.fit` and consider using `contrastAsCoef`.
* Mixed effects models are a popular next step beyond the fixed effects models covered in this workshop.
* [Mixed Models in R](https://m-clark.github.io/mixed-models-with-R/)
## Author
This course has been developed for the [Monash Bioinformatics Platform](https://www.monash.edu/researchinfrastructure/bioinformatics) and [Monash Data Fluency](https://www.monash.edu/data-fluency) by Paul Harrison.
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="figures/CC-BY.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
## Source code
* [GitHub repository](https://github.com/MonashDataFluency/r-linear)
<br>
<br>
<div style="font-size: 75%">
[Solutions to challenges](topics/solutions.html)
</div>
<p style="margin-top: 2em; text-align: right">
<a href="https://www.monash.edu/researchinfrastructure/bioinformatics"><img src="figures/MBP-banner.png" width="675"></a>
</p>