-
Notifications
You must be signed in to change notification settings - Fork 22
Heterogeneous variances
By default, the Linear Mixed Models fitted with breedR assume homoscedasticity. Meaning that given all the fixed and random effects, the unexplained variation follow a Normal distribution with residual variance σ2.
Mathematically, that
Sometimes this is obviously wrong, and we need models where some observations are observed with more or less residual variability than others
Here are a few common situations where heterogeneous variances are needed:
-
The observations are actually derived or calculated from real measurements, such as an average. Thus, the variance depends on the number of averaged measurements (e.g. Daughter Yield Deviation measures).
-
The observations are spread in time, and you want to model the residual variance as a function of time (e.g. longitudinal models).
If the relative variation in the residual variances is know or can be estimated, it can be specified as a vector of weights w, such that
Here is a simulation example of how to specify weights.
set.seed(123)
n <- 1e3 # n obs
sigma2 <- 4 # true residual variance (for a weight of 1)
w = runif(n, min = .5, max = 2) # vector of weights
dat <-
transform(
data.frame(
e = rnorm(n, sd = sqrt(sigma2))
),
y = 10 + e/sqrt(w) # simulated phenotype
)
res <- remlf90(
y ~ 1,
data = dat,
weights = w # specification of weights
)
## Using default initial variances given by default_initial_variance()
## See ?breedR.getOption.
Note that the estimated residual variance is close to the true value. On the other hand, the residual prediction-error are expected to have non-constant variance.
summary(res)
## Formula: y ~ 0 + Intercept
## Data: dat
## AIC BIC logLik
## 4080 4085 -2039
##
##
## Variance components:
## Estimated variances S.E.
## Residual 4.012 0.1618
##
## Fixed effects:
## value s.e.
## Intercept 10.016 0.0567
ggplot(transform(dat, est_e = residuals(res)), aes(e, est_e)) +
geom_point() +
geom_abline(intercept = 0, slope = 1, color = "darkgray")
This is currently not available in breedR.
Different group-wise residual variances (e.g. multi-site) can be easily induced by using group-specific random effects.
For the general case, here are some notes to allow for some manual hacking if needed.
In general, we need to estimate a residual variance parameter as a function of some other variable x. We then write the residual variance as a linear combination of a few base functions:
This covers the case both for group-wise residual variances (such as multi-site, using a categorical variable x) or a continuously varying residual variance.
For the first case, the variable x is categorical, taking a finite number of values K, and we define ψk as the corresponding indicator functions.
For the continuous case, the variable x is continuous (e.g. age, temperature) and the base functions can be Legendre polynomials, splines, etc. up to some arbitrary order K.
We need to manually build the matrix hetres_pos
and hetres_pol
available in AI-REML (see documentation).