diff --git a/2009-flu-uk.qmd b/2009-flu-uk.qmd new file mode 100644 index 0000000..1959388 --- /dev/null +++ b/2009-flu-uk.qmd @@ -0,0 +1 @@ +# 2009 pandemic in the UK \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml index 7b55dd7..130b7d6 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -21,10 +21,14 @@ book: - part: "Monty" chapters: - monty.qmd + - model.qmd + - monty-dsl.qmd + - samplers.qmd - part: "Odin & Monty" chapters: - inference.qmd - differentiability.qmd + - 2009-flu-uk.qmd - references.qmd appendices: - installation.qmd diff --git a/model.qmd b/model.qmd new file mode 100644 index 0000000..b6a91e5 --- /dev/null +++ b/model.qmd @@ -0,0 +1,102 @@ +# About (monty) Models {#sec-monty-models} + +```{r} +#| include: false +source("common.R") +``` + +In this book, we explore two fundamentally different yet interconnected approaches to modelling: **dynamical systems** and **statistical models**. To illustrate these differences, let’s begin with a familiar (but maybe dated!) metaphor: the game [*SimCity*](https://en.wikipedia.org/wiki/SimCity). + +## Models vs. Models: dynamical and statistical perspectives + +### SimCity and dynamical systems + +Imagine a city model, as in *SimCity*, where thousands of virtual inhabitants follow routines, interact, and respond to changes like new buildings or natural disasters. These elements evolve according to predefined rules and parameters, similar to how **dynamical systems** simulate real-world processes over time. + +In a [**dynamical model**](https://en.wikipedia.org/wiki/Dynamical_system), we track changes in a system's "state" over time. The state is a summary of key information at a specific time point; for example, in an epidemiological model, the state might include the numbers of susceptible, infected, and recovered individuals. More detailed models may add variables like age, location, health risks, and symptoms, allowing us to simulate interventions, such as vaccination campaigns, and explore their potential impacts. There exist many formalisms to describe dynamical systems, incorporating e.g. specific input and output functions in control theory but the central element of these systems is this notion of the "state", a mathematical object summarising the system at a timepoint. + +The `odin` package, which supports differential and difference equations, is an effective tool for building dynamical models, enabling users to simulate time-evolving systems. Dynamical models are well-suited for exploring how specific scenarios or interventions impact a system over time, making them particularly useful for modelling real-world phenomena in a structured way. + +TODO: simple odin deterministic model with discussion about results being a time indexed suite of "states" + +### Statistical models and uncertainty + +[**Statistical models**](https://en.wikipedia.org/wiki/Statistical_model), on the other hand, focus on capturing uncertainty in outcomes. Instead of simulating the system’s evolution over time, they represent possible states of the world probabilistically. Statistical models summarise data patterns or help make predictions based on uncertainties, such as estimating the range of future cases of a disease or identifying risk factors in a population. + +The `monty` package facilitates statistical modelling, providing tools for working with probability distributions and Monte Carlo methods. Where `odin` enables us to explore the dynamics of changing systems, `monty` is ideal for making probabilistic inferences and assessing variability across potential outcomes without necessarily modelling time-evolving dynamics. + +TODO: simple gaussian model - discussion about parameter being actually the sampling space and explain that it connects with Bayesian inference, show that the important component of the model is the parameters and domain support + density function, maybe shows that the model can be built in different way (R function, monty_model(), DSL). + +## Bridging dynamical systems and statistical models + +While dynamical systems and statistical models serve distinct purposes, they are not strictly separate. In fact, they can be connected through two powerful approaches that bring probabilistic reasoning into dynamical systems: **stochastic processes** and **Bayesian inference**. + +### Stochastic processes: adding uncertainty to dynamical models + +A natural way to bridge dynamical and statistical models is by introducing [**stochastic processes**](https://en.wikipedia.org/wiki/Stochastic_process). Traditional dynamical models use fixed parameters and deterministic rules to evolve a system’s state over time. However, many real-world systems have inherent randomness or uncertainty that deterministic models cannot capture. + +In a **stochastic process**, the system’s state is no longer a single deterministic value but a collection of potential outcomes, each weighted by a probability. This probabilistic view enables us to model fluctuations and uncertainties within the dynamical framework, capturing both the system’s evolution and the uncertainty in each state. Stochastic processes are thus a natural extension of dynamical models, adding an extra layer of realism by treating system evolution as inherently uncertain. + +The `odin` package provides an intuitive framework for writing a class of stochastic systems, making it easier to define models that incorporate randomness in their evolution over time. The `dust` package complements `odin` by enabling efficient large-scale simulations, allowing users to capture and analyse the uncertainty inherent to these systems through repeated runs. Together, `odin` and `dust` offer a powerful toolkit for developing and exploring stochastic models that reflect the complexity and variability of real-world dynamics. + +TODO simple dust example or just link with relevant dust section in the book + +### Bayesian inference: statistical modelling of model parameters + +**Bayesian inference** is another approach to linking dynamical and statistical models by treating model parameters as random variables rather than fixed values. This introduces a probability distribution over possible parameter values, making parameter estimation a statistical problem. + +Using Bayes’ theorem, Bayesian inference combines: + + - **The likelihood**: the probability of observing the data given specific parameter values, and + - **The prior distribution**: our initial assumptions about the parameter values before observing the data. + +The result is the **posterior distribution** of a parameter $\theta$ given data $y$: + +$$ +p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)} +$$ + +where: + + - $p(y|\theta)$ is the likelihood, + - $p(\theta)$ is the prior distribution, and + - $p(y)$ is a normalising constant to ensure the posterior integrates to 1 over all $\theta$ values. + +Bayesian inference allows us to update our understanding of parameters based on observed data, yielding a posterior distribution that incorporates both model assumptions and the influence of the data. This approach provides a probabilistic framework that can adapt dynamically as new data becomes available, making it ideal for statistical models that require flexibility. + +Many statistical modelling tools—such as [WinBUGS](https://en.wikipedia.org/wiki/WinBUGS), [JAGS](https://en.wikipedia.org/wiki/Just_another_Gibbs_sampler), `stan`, and `BayesTools`—are fundamentally Bayesian. However, the `monty` package is **Bayesian-agnostic**, allowing users to choose between Bayesian and non-Bayesian approaches to probabilistic modelling, depending on the analytical requirements. + +By combining **stochastic processes** with **Bayesian inference**, we add a dual dimension of uncertainty: randomness in the state of dynamical systems through stochastic processes, and probabilistic modelling of parameters through Bayesian methods. Together, these frameworks enable us to build robust models for complex systems that evolve over time, capturing both inherent randomness and uncertainty in our understanding. + +## Parameters and model complexity + +In both dynamical and statistical frameworks, the number of parameters can be adjusted as needed to capture the desired level of complexity. In the `monty` package, random variables - termed 'parameters' with a slight simplification of language - are typically used to summarise processes, and so they often form a more compact set than those found in dynamical models. This distinction is especially relevant in Bayesian models constructed from complex `odin` models. + +In dynamical systems, parameters define the structure and evolution of a scenario in detail. For instance, an epidemiological model may include parameters for transmission rates, contact patterns, or intervention schedules. These inputs enable "what-if" scenarios, allowing decision-makers to predict and manage changes in real time. The `odin` package, designed to support such dynamical models, provides the flexibility to specify numerous parameters for exploring system behaviours over time. + +Statistical models, by contrast, use parameters to define probability distributions over possible outcomes, capturing uncertainties, predicting risks, or summarising data patterns. In Bayesian models based on a complex `odin` framework, the statistical parameters are usually a subset of those used in the dynamical model itself. Parameters such as those defining a vaccination campaign (e.g. daiy number of doses given to target groups), for example, might be central to shaping the `odin` model but may not necessarily be included in Bayesian inference (that might focus on just vaccine efficacy at most). This selective approach allows us to quantify uncertainties and make probabilistic inferences about key aspects of the model without needing to explore detail of the underlying dynamics that are "known" for what was actually observed. + +Thus, while dynamical models rely on a broad parameter set for flexibility, statistical parameters summarise uncertainty more compactly, making the combined approach especially effective for realistic, data-driven inferences. + +## Probability densities: normalised vs. unnormalised + +A key concept in Bayesian inference and Monte Carlo methods is the distinction between **normalised** and **unnormalised** probability densities. + +In Bayesian inference, rather than using the "full" Bayesian as above, we are often working with defining posterior density as being proportional to a product of densities: + +$$ +p(\theta | y) \propto p(y|\theta) p(\theta) +$$ + +where: + + - $p(y|\theta)$ is the likelihood, and + - $p(\theta)$ is the prior distribution, as above. + +Note that it does not involve the normalising constant $p(y)$ anymore. The reason is that since calculating $p(y)$ can be very difficult, we often work with the **unnormalised posterior density** $p(y|\theta)p(\theta)$. This unnormalised form is sufficient for many Monte Carlo methods, where only relative densities matter. + +A **normalised density** integrates to 1 over its entire parameter space. This is necessary for direct probability interpretations and for certain Bayesian methods, like model comparison using [Bayes factors](https://en.wikipedia.org/wiki/Bayes_factor). + +Monte Carlo algorithms, such as **Metropolis-Hastings** and **Importance Sampling**, often operate using unnormalised densities, focusing on relative probabilities rather than absolute values. This makes them efficient for high-dimensional problems where calculating a normalising constant is untractable. + +Normalisation in probability densities has parallels in physics. In statistical mechanics, the **partition function** $Z$ normalises probabilities over all possible states, much like normalising densities in Bayesian inference. This connection highlights why algorithms like **Metropolis-Hastings** use unnormalised densities: they mirror physical systems where absolute energies are less relevant than energy differences. \ No newline at end of file diff --git a/monty-dsl.qmd b/monty-dsl.qmd new file mode 100644 index 0000000..12a431f --- /dev/null +++ b/monty-dsl.qmd @@ -0,0 +1,112 @@ +# The monty DSL {#sec-monty-dsl} + +```{r} +#| include: false +source("common.R") +``` + +The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focuses on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models. + +```{r} +library(monty) +``` + +## Some simple examples + +In chapter 4 of Statistical Rethinking, we build a regression model of height with parameters $\alpha$, $\beta$ and $\sigma$. We can define the model for the prior probability of this model in monty by running + +```{r} +prior <- monty_dsl({ + alpha ~ Normal(178, 20) + beta ~ Normal(0, 10) + sigma ~ Uniform(0, 50) +}) +``` + +This will define a new `monty_model()` object that represents the prior, but with all the bits that we might need depending on how we want to use it: + +We have model parameters + +```{r} +prior$parameters +``` + +These are defined in the order that they appear in your definition (so `alpha` is first and `sigma` is last) + +We can compute the domain for your model: + +```{r} +prior$domain +``` + +We can draw samples from the model if we provide a `monty_rng` object + +```{r} +rng <- monty_rng$new() +theta <- prior$direct_sample(rng) +theta +``` + +We can compute the (log) density at a point in parameter space + +```{r} +prior$density(theta) +``` + +The computed properties for the model are: + +```{r} +prior$properties +``` + +## Calculations in the DSL + +Sometimes it will be useful to perform calculations in the code; you can do this with assignments. Most trivially, giving names to numbers may help make code more understandable: + +```{r} +m <- monty_dsl({ + mu <- 10 + sd <- 2 + a ~ Normal(mu, sd) +}) +``` + +You can also use this to do things like: + +```{r} +m <- monty_dsl({ + a ~ Normal(0, 1) + b ~ Normal(0, 1) + mu <- (a + b) / 2 + c ~ Normal(mu, 1) +}) +``` + +Where `c` is drawn from a normal distribution with a mean that is the average of `a` and `b`. + +## Pass in fixed data + +You can also pass in a list of data with values that should be available in the DSL code. For example, our first example: + +```{r} +prior <- monty_dsl({ + alpha ~ Normal(178, 20) + beta ~ Normal(0, 10) + sigma ~ Uniform(0, 50) +}) +``` + +Might be written as + +```{r} +fixed <- list(alpha_mean = 170, alpha_sd = 20, + beta_mean = 0, beta_sd = 10, + sigma_max = 50) +prior <- monty_dsl({ + alpha ~ Normal(alpha_mean, alpha_sd) + beta ~ Normal(beta_mean, beta_sd) + sigma ~ Uniform(0, sigma_max) +}, fixed = fixed) +``` + +Values you pass in this way are **fixed** (hence the name!) and cannot be modified after the model object is created. \ No newline at end of file diff --git a/monty.qmd b/monty.qmd index 3a15487..a98c4ee 100644 --- a/monty.qmd +++ b/monty.qmd @@ -1,4 +1,4 @@ -# Monty +# Getting started with monty ```{r} #| include: false @@ -7,15 +7,94 @@ source("common.R") Before showing how to fit odin models to data, we'll focus for a bit on monty itself. If you are anxious to get on and fit the model from @sec-data-filter, you might skip ahead to @sec-inference, where we resume this. -There are a bunch of things we want to cover in this chapter so likely it will be split into a few: +The `monty` R package is designed for modular work with statistical distributions, leveraging Monte Carlo methods for sampling. It enables the construction of increasingly complex models through four approaches: -* defining a model in monty, both easily with `monty_function` and less easily with `monty_model` -* fitting models with MCMC -* writing priors with the `monty_dsl` -* using observers to follow along with the model (if we can come up with a good example) -* working with nested models (once we write the code that would allow this) -* running MCMC chains in parallel -* exporting chains to work with other packages -* different samplers +1. Simple R code, +2. A dedicated domain-specific language (DSL), +3. Integration of `odin` models, +4. Composition of two `monty` models. -Hmm, there really is quite a bit of ground to cover here! +The most basic method to define a `monty` model is through a simple R function, as shown in this chapter. More advanced examples are covered in the @sec-monty-models in particular in the context of Bayesian statistics. @sec-monty-dsl introduces the monty DSL for more versatile model building using a small probabilistic DSL similar to BUGs. The last part of this book starting from @sec-inference demonstrates how to incorporate `odin` models into `monty` workflows. + +## A simple example + +```{r} +library(monty) +``` + +We can define a simple Gaussian [mixture model](https://en.wikipedia.org/wiki/Mixture_model) of two sub-populations using the `monty_model_function()` from the package. The model represents the distribution of a quantity $l$, sampled with a probability $p$ from a normal distribution of mean $m_{1}$ and with a probability $1-p$ from another normal distribution of mean $m_{2}$. Both subpopulations have variance equal to 1. + +To build our `monty` model, we start by defining an R function returning the log-density of our statistical model. This function has four arguments $l$ (the quantity of interest) and the three parameters $p$, $m_{1}$ and $m_{2}$ defining the two Normally-distributed subpopulations. + +```{r} +fn <- function(l, p, m1, m2) { + log(p * dnorm(l, mean = m1) + (1 - p) * dnorm(l, mean = m2)) +} +``` + +Assuming that the population is 75% from subpopulation 1 (normally distributed with mean 3) and 25% from subpopulation 2 (also normally distributed but with mean 7), we can build our `monty` model by indicating that the parameters of the subpopulations are fixed at these given values. + +```{r} +mixture_model <- monty_model_function(fn, + fixed = list(p = 0.75, m1 = 3, m2 = 7), + allow_multiple_parameters = TRUE) +``` + +We have just created a `monty` model. + +```{r} +mixture_model +``` + +We can plot the density of our model for values of $l$ between 0 and 10 and eye-check that we can see the two subpopulations in the correct proportions and with 3 and 7 as modes. + +```{r} +#l <- matrix(seq(from = 0, to = 10, by = 0.1), 1) +l <- seq(from = 0, to = 10, by = 0.1) +#monty_model_density(l_distribution, l) +plot(l, + exp(Vectorize(mixture_model$density)(l)), + type = "l", + ylab = "density") +``` + +## Sampling from our example distribution + +We now want to sample from this model, using the `monty_sample()` function. For this we need to tell `monty` which sampler we want to use to explore our distribution. There are a variety of samplers available and you can learn about them in @sec-samplers. One of the simplest is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently) in most cases. + +The random walk sampler uses a variance-covariance (VCV) matrix to guide its exploration, determining the 'jump' from the current point to the next in a random walk by drawing from a multivariate normal distribution parameterised by this matrix. For our single-parameter model here, we use a 1x1 matrix of variance 2 (`matrix(2)`) as our VCV matrix. + +The choice of the VCV matrix is critical for the sampler's efficiency, especially in more complex cases where the tuning of this matrix can significantly affect performance. A well-chosen VCV matrix optimises moving accross the parameter space, making the random walk sampler more effective in exploring the distribution of interest. + +```{r} +sampler <- monty_sampler_random_walk(matrix(2)) +``` + +We are now ready to sample from our distribution using `monty_sample()` with 2000 samples, starting our MCMC chain at the value 3 and running 4 chains sequentially. + +```{r} +samples <- monty_sample(mixture_model, sampler, 2000, initial = 3, n_chains = 4) +``` + +We can visualise our 4 chains. +```{r} +matplot(samples$density, type = "l", lty = 1, + xlab = "sample", ylab = "log posterior density", col = "#00000055") +``` + +We can also check that our samples are correctly representing our distribution: +```{r} +hist(samples$pars["l", , ], breaks = 100) +``` + +## Connection of `monty` with other software + +`monty` includes a simple probabilistic domain-specific language (DSL) that is inspired by languages of the BUGS family such as `stan` and [Statistical Rethinking](https://xcelab.net/rm/). It is designed to make some tasks a bit easier, particularly when defining priors for your model. We expect that this DSL is not sufficiently advanced to represent most interesting models but it may get more clever and flexible in the future. In particular we do not expect the DSL to be useful in writing likelihood functions for comparison to data; we expect that if your model is simple enough for this you would be better off using `stan` or some similarly flexible system. + +*mention SSM, dr jacoby, mcstate, BayesTools + +## Going further + +We've discussed a little bit about the philosophy of `monty` and built a simple model using an R function with `monty_function`. The example introduces the basics of defining and sampling from a `monty` model. + +For more advanced applications, refer to @sec-monty-dsl for constructing models using `monty`'s domain-specific language, which enables greater flexibility for complex structures. Lastly, @sec-inference illustrates how to incorporate `odin` models into `monty` workflows, expanding the package’s potential for Bayesian inference with more sophisticated, system-based models. Each section provides detailed examples to guide you in leveraging `monty`’s full capabilities. diff --git a/samplers.qmd b/samplers.qmd new file mode 100644 index 0000000..f980835 --- /dev/null +++ b/samplers.qmd @@ -0,0 +1,113 @@ +# Samplers and inference {#sec-samplers} + +```{r} +#| include: false +source("common.R") +``` + +Many quantities of interest in uncertain systems can be expressed as integrals that weight outcomes according to their probability. A common approach to computing these integrals is through Monte Carlo estimation, where we draw samples from our distribution of interest and take their mean as an approximation. This method, known as a 'Monte Carlo' estimate, has been central to probabilistic inference and has driven the development of sophisticated sampling algorithms since the advent of modern computing in the 1950s. + +The `monty` package offers a range of sampling algorithms tailored to different types of distributions and varying levels of available information. This flexibility allows users to select the most efficient sampler based on their distribution's characteristics and computational resources. + +## Samplers supported by `monty` + +```{r} +library(monty) +``` + +### Randow-Walk Metropolis-Hastings + +### Adaptive Random Walk + +### Nested models + +### Hamiltonian Monte Carlo + +## A simple example: The bendy banana + +This example shows HMC outperforming a random walk on a two dimensional banana-shaped function. Our model takes two parameters `alpha` and `beta`, and is based on two successive simple draws, with the one conditional on the other, so $\beta \sim Normal(1,0)$ and $\alpha \sim Normal(\beta^2, \sigma)$, with $\sigma$ the standard deviation of the conditional draw. + +We include this example within the package; here we create a model with $\sigma = 0.5$ + +```{r} +m <- monty_example("banana", sigma = 0.5) +m +``` + +We can plot a visualisation of its density by computing the density over a grid. Normally this is not possible of course: + +```{r} +a <- seq(-2, 6, length.out = 1000) +b <- seq(-2, 2, length.out = 1000) +z <- outer(a, b, function(alpha, beta) { + exp(monty_model_density(m, rbind(alpha, beta))) +}) +image(a, b, z, xlab = "alpha", ylab = "beta") +``` + +In this particular case we can also easily generate samples, so we know what a good sampler will produce: + +```{r} +rng <- monty_rng$new() +s <- vapply(seq(200), function(x) m$direct_sample(rng), numeric(2)) +image(a, b, z, xlab = "alpha", ylab = "beta") +points(s[1, ], s[2, ], pch = 19, col = "#00000055") +``` + +It is also possible to compute the 95% confidence interval of the distribution using the relationship between the standard bivariate normal distribution and the banana shaped distribution as defined above. We can check that roughly 10 samples (out of 200) are out of this 95% CI contour. + +```{r} +theta <- seq(0, 2 * pi, length.out = 10000) +z95 <- local({ + sigma <- 0.5 + r <- sqrt(qchisq(.95, df = 2)) + x <- r * cos(theta) + y <- r * sin(theta) + cbind(x^2 + y * sigma, x) +}) +image(a, b, z, xlab = "alpha", ylab = "beta") +lines(z95[, 1], z95[, 2]) +points(s[1, ], s[2, ], pch = 19, col = "#00000055") +``` + +## Sampling with other samplers + +It is not generally possible to directly sample from a density (otherwise MCMC and similar methods would not exist!). In these cases we need to use a sampler based on the density and if available possibly the gradient of the density. + +We can start with a basic random-walk sampler: + +```{r RW_sampling} +sampler_rw <- monty_sampler_random_walk(vcv = diag(2) * 0.01) +res_rw <- monty_sample(m, sampler_rw, 1000) +plot(t(drop(res_rw$pars)), type = "l", col = "#0000ff66", + xlim = range(a), ylim = range(b)) +lines(z95[, 1], z95[, 2]) +``` + +As we can see this is not great, exhibiting strong random walk behaviour as it slowly explores the surface (this is over 1,000 steps). Another way to view this is the parameters varying over steps: + +``` +matplot(t(drop(res_rw$pars)), lty = 1, type = "l", col = c(2, 4), + xlab = "Step", ylab = "Value") +``` + +We can probably improve the samples here by finding a better variance covariance matrix (VCV), but a single VCV will not hold well over the whole surface because it is not very similar to a multivariate normal (that is, the appropriate VCV will change depending on the position in parameter space) + +Let's try the Hamiltonian Monte Carlo (HMC) sampler, which uses the gradient to move efficiently in parameter space: + +```{r HMC_sampling} +sampler_hmc <- monty_sampler_hmc(epsilon = 0.1, n_integration_steps = 10) +res_hmc <- monty_sample(m, sampler_hmc, 1000) +plot(t(drop(res_hmc$pars)), type = "l", col = "#0000ff33", + xlim = range(a), ylim = range(b)) +lines(z95[, 1], z95[, 2]) +``` + +or viewed over steps: + +```{r} +matplot(t(drop(res_hmc$pars)), lty = 1, type = "l", col = c(2, 4), + xlab = "Step", ylab = "Value") +``` + +Clearly better!