- Visit www.justinbelair.ca if you have any questions or need help with statistics.
-
- This paper is quite technical, but truly amazing! Anybody with a background in pure statistics should read this paper to widen their theoretical understanding of designing hypotheses tests to be applied in research settings. I can't recommend this paper enough!
-
- A must-read! Take-home message : t-test is great, especially its robust forms!
-
Lakens et al., Equivalence Testing for Psychological Research: A Tutorial, 2018
- A great gentle introduction to minimal-effects testing, equivalence testing, and inferiority testing. These are underutilized tools that should be taught to any applied researchers using hypothesis tests for experimental data! A must read.
- Related to this paper is an R package called TOSTER, developed by Lakens and others. Check out the package vignette here
- Sander Greenland et al., Statistical tests, P values, confidence intervals, and power: a guide to misinterprations, 2016
- THE resource for all p-value misinterpretations by a collection of eminent statisticians. Must be read and re-read!
- Sander Greenland, Nonsignificance Plus High Power Does Not Imply Supper for the Null Over the Alternative, 2012
- The title says it all...it's an easy mistake to make!
- John M. Hoenig & Dennis M. Heisey, The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis, 2001
- Power is a widely misunderstood statistical concept. Study it!
- Steven Goodman, A Dirty Dozen: Twelve P-Value Misconceptions, 2008
- The title says it all.
- Harvey J. Motulsky, Common misconceptions about data analysis and statistics, 2014
- Goes over many misconceptions. It is quite beginner friendly, take a look!
-
B.J. Winer, Donald R. Brown, Kenneth M. Michels, Statistical Principles in Experimental Design, Third edition, 1991 (Book)
- A very thick book that thoroughly covers a wide-array of experimental designs. A great reference manual for anyone working on experimental research, espcially with human-subjects (e.g. psychology)
-
- A great introduction to the within-person (aka within-patient, within-subject) randomized controlled trial. This type of trial is used when it is possible to use a patient as its own control (e.g. in ophthalmology, where each eye can be randomized to a different treatment).
-
- Anyone running a within-person trial should consult these guidelines for maximizing the utility of what they report from the trial!
- Frederic M. Lord, A Paradox in the Interpretation of Group Comparisons, 1967
- The first very influential paper describing what is now known as Lord's Paradox.
- Frederic M. Lord, Statistical Adjustments When Comparing Preexisting Groups, 1969
- Lord's second paper that goes into more details on his 'paradox'
- Judea Pearl, Lord's Paradox Revisited - (Oh Lord! Kumbaya!), 2016
- A causal inference perspective by one of its main contributors, Judea Pearl. (See Causal Inference section of this reading list)
- Jose D. Perezgonzalez, Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing, 2015
- An interesting overview of the two foundational classical approaches to testing in statistics: Fisher's approach and the Neyman-Pearson framework. The third approach, Null Hypothesis Significance Testing (NHST) is presented as a loose and controversial approach lacking rigour. A must read!
-
R. A. Fisher, The Design of Experiments, 1935 (Book)
- Contains Fisher's famous lady-tasting tea experiment, first example I know of permutation testing, many groundbreaking examples of Analysis of Variance (ANOVA), and some disparaging (and very funny) remarks towards Pearson.
-
Stef van Buuren, Flexible Imputation of Missing Data, 2018 (Book, with online version)
- A must-have for any applied statistician dealing with missing data problems. This book presents the state-of-the-art in multiple imputation (MI), a field where van Buuren made his name. Contains lots of concrete examples with code, discusses trade-offs in complex situations, and gives lots of references to literature with simulation studies to back any claims up.
-
Gert Molenberghs and Michael G. Kenward, Missing Data in Clinical Studies, 2007 (Book)
- A deep and thorough exposition of missing data in clinical studies. A complex book for advanced statisticians, especially those working in clinical studies.
-
Roderick J. A. Little & Donald B. Rubin, Statistical Analysis with Missing Data, 2002 (Book)
- The first textbook put together to reflect the growing literature on missing data methodology. Still useful, although van Buuren, 2018 is probably better suited for applied statisticians
-
Judea Pearl, Causality : Models, Reasoning and Inference, 2000, updated in 2009 (Book)
- A true masterpiece. A technical and deep exposition of Pearl's life work on Directed Acyclic Graphs (DAGs) as Structural Causal Models (SCMs) that got me started on my causal inference journey. His viewpoint is an alternative to the Neyman-Rubin causal model based on potential outcomes. This book can also be seen as the academic version of The Book of Why, a famous general-audience book on causality.
-
Guido W. Imbens and Donald B. Rubin, Causal Inference for Statistics, Social, and Biomedical Sciences, 2015 (Book)
- A true masterpiece. The most achieved and thorough exposition of the Neyman-Rubin causal model based on potential outcomes. It is an alternative to Pearl's DAG and SCM framework (see above). A beautiful book that I find myself going back to often, for its depth and breadth of insights into thinking about causal inference. Imbens is an economist who contributed much to this field, most notably through is Local-Average Treatment Effect identification in cases of non-compliance. Rubin is one of the greatest living statisticians.
-
Judea Pearl, Madelyn Glymour & Nicholas P. Jewell, Causal Inference in Statistics : A Primer, 2016
- A gentle introduction to Directed Acyclic Graphs (DAGs) and Structural Causal Models (SCMs) at about the undergraduate in statistics level.
-
Bill Shipley, Cause and Correlation in Biology : A User's Guide to Path Analysis, Structural Equations and Causal Inference, 2000 (Book)
- A well-written introduction to causal inference for biologists, with an emphasis of Structural Equation Models (SEMs) and Path Analysis. There is also a little bit of interesting history sprinkled in. I took a class with this professor (who just retired from a University close to my home town) in 2023 and his focus on biological applications without sacrificing rigour is great for any non-statistician looking to tackle complex statistical methods!
When building prediction models, we are less interested about inference on the parameters and more focused on the values and uncertainty of the predictions. In clinical settings, robust prediction models can mean the difference between life and death!
-
Ewout W. Steyerberg, Clinical Prediction Models, 2010 (Book, with online content)
- A pretty thick reference manual by a leader in the field. Aimed especially towards prediction models in clinical settings, it's a must-have for any advanced modeller looking to make a difference in healthcare with novel technologies.
-
- Part 1 of a step-by-step tutorial on rigorous and robust clinical prediction model-building, focused on the early stages of model-building.
-
- Part 2 of a step-by-step tutorial on rigorous and robust clinical prediction model-building, focused on conducting external validation of the model built following the steps outlined in part 1.
-
- Part 3 of a step-by-step tutorial on rigorous and robust clinical prediction model-building, focused on power and sample size calculations. It is often difficult to know the sample-size required for adequate external validation data. This guide offers detailed instructions on conducting these estimations, once we've built a model.
Epidemiology is a discipline distinct from biostatistics, but there is strong overlap in the methods. Epidemiology relies on many difficult design principles to obtain valid inferences. A few textbooks that are must-haves for epidemiologists.
-
Kenneth J. Rothman & Sander Greenland, Modern Epidemiology, Second Edition, 1998 (Book)
- The bible of modern epidemiology. An authoritative textbook on study design principles. Its sections on analysis techniques are a bit dated. Also, it doesn't discuss much of the causal inference techniques and principles that have come to slowly dominate the field through the works of VanderWeele, Hernàn, Robins and others. Still, anybody wishing to understand how to think like an epidemiologist must tackle this book. Its explanation of case-control studies and their peculiarities is particularly illuminating.
-
Leon Gordis, Epidemiology, Fifth Edition, 2014 (Book)
- A very popular introduction to Epidemiology in color with many images and illustrations. A good tool to learn the basics of epidemiological design principles.
In 1965, Bradford Hill proposed a series of 9 criteria which should be thought about when trying to uncover a causal relationship among the correlational noise. Causal inference has a gone a long way since, but these 9 criteria are still widely discussed and serve as guiding principles in epidemiology and its subfields.
-
Sir Austin Bradford Hill, The Environment and Disease: Association or Causation?, 1965
- The classic President's Address delivered to newly formed Section of Occupational Medicine of the Royal Society of Medicine by Sir Bradford Hill in which he presents his famous 9 criteria for an association to be deemed causal. The paper went on to become tremendously influential and its still commented to this day.
-
Glass, Goodman, Hernan & Samet, Causal Inference in Public Health, 2013
- A modern discussion of causal inference through the lens of policymaking in public health areas.
-
- Discusses how our understanding of Bradford Hill's original 9 criteria has evolve over time through a review of examples taken from molecular epidemiology.
Generalized Linear Autoregressive Moving Average (GLARMA) Models for Count Data (Poisson, Binomial, Negative Binomial)
- Zeger, A regression model for time series of counts, 1988
- A classic paper discussing the problem of modelling time series of counts, with the famous example of U.S. Polio incidence data, now part of the
glarma
package.
- A classic paper discussing the problem of modelling time series of counts, with the famous example of U.S. Polio incidence data, now part of the
- Davis, Wang & Dunsmuir, Modeling Time Series of Count Data, 1999. In S Ghosh (ed.), Asymptotics, Nonparametrics, and Time Series, volume 158 of Statistics Textbooks and Monographs, pp. 63-114
- A theoretical paper discussing differences between parameter-driven and observation-driven state-space models, with many example analyses at the end.
- Davis, Dunsmuir and Streett, Observation-driven models for Poisson counts, 2003
- A theoretical paper with an interesting example application to the Asthma dataset.
- Dunsmuir & Scott, The
glarma
Package for Observation Driven Time Series Regression of Counts, 2015- The
glarma
package vignette with theory, code, and examples.
- The
-
- The
ordinal
package vignette with theory, code, and examples. It's rather lengthy and extensive!
- The
-
Peter McCullagh, Regression Models for Ordinal Data, 1980
- A foundational paper for the proportional odds model, relating it mathematically to the famoux Cox proportional hazards model.
-
Christopher Winship & Robert D. Mare, Regression Models With Ordinal Variables, 1984
- A foundational paper describing techniques to handle ordinal data especially aimed at eliminating bad practices in the sociology literature.
-
- A tutorial for ordinal logistic regression using
MASS
package. No matter if you use this package orclm
, this tutorial is interesting as it addresses how to analyze the data before modeling, namely by checking the proportional odds assumption.
- A tutorial for ordinal logistic regression using
-
Foundational papers
-
Regression tools
- Cribari-Neto & Zeilis, Beta Regression in R, 2010
- The paper accompagnying the R package betareg
- Kubinec, Ordered Beta Regression: A Parsimonious, Well-Fitting Model for Continuous Data with Lower and Upper Bounds, 2021
- A paper proposing an alternative to the Zero-One Inflated Beta (ZOIB Model)
- Cribari-Neto & Zeilis, Beta Regression in R, 2010
- Zhang, Qiu & Shi, simplexreg: An R Package for Regression Analysis of Proportional Data Using the Simplex Distribution, 2016
- Simplex Regression, an alternative to beta regression using the Simplex distribution. Incorporates both MLE and GEE techniques.
- Visit www.justinbelair.ca if you have any questions or need help with statistics.