From 8fd6b677bede8007a06ea44bfe6e82f2e7d88dce Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Mon, 4 Nov 2024 15:57:37 +0000
Subject: [PATCH 01/34] Starting the monty section

---
 _quarto.yml       |   4 ++
 composability.qmd |   1 +
 model.qmd         |   7 +++
 monty.qmd         |  38 +++++++++++++++-
 montyDSL.qmd      | 109 ++++++++++++++++++++++++++++++++++++++++++++++
 samplers.qmd      |  96 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 254 insertions(+), 1 deletion(-)
 create mode 100644 composability.qmd
 create mode 100644 model.qmd
 create mode 100644 montyDSL.qmd
 create mode 100644 samplers.qmd

diff --git a/_quarto.yml b/_quarto.yml
index db398fc..4cef2c0 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -21,6 +21,10 @@ book:
     - part: "Monty"
       chapters:
       - monty.qmd
+      - model.qmd
+      - montyDSL.qmd
+      - composability.qmd
+      - samplers.qmd
     - part: "Odin & Monty"
       chapters:
       - inference.qmd
diff --git a/composability.qmd b/composability.qmd
new file mode 100644
index 0000000..5b4977f
--- /dev/null
+++ b/composability.qmd
@@ -0,0 +1 @@
+# Composability
\ No newline at end of file
diff --git a/model.qmd b/model.qmd
new file mode 100644
index 0000000..38795a1
--- /dev/null
+++ b/model.qmd
@@ -0,0 +1,7 @@
+# About (monty) models
+
+We often refers to using models.
+
+## SimCity vs Statistical models
+
+## Normalised vs unormalised densities
\ No newline at end of file
diff --git a/monty.qmd b/monty.qmd
index 8ddca01..fdf6fc1 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,4 +1,40 @@
-# Monty
+# Getting started with monty
+
+* the philosophie
+* three options
+
+`monty` is a R package design to work with statistical models in a *modular* way. It uses Monte Carlo methods to facilitate drawing from these statistical models and is currently built to create complex models through four different routes:
+
+* simple R code
+* a dedicated DSL
+* odin models
+* composing two monty models.
+
+The simple example in this chapter is built from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through composition of two monty models. monty models obtains from odin models are dealt in the odin & monty chapter.
+
+## A simple example
+
+```{r}
+library(monty)
+```
+
+We can define a simple Gaussian mixture model of two sub-population using the `monty_model_function()` from the package
+
+For this we start by defining 
+
+```{r}
+fn <- function(l, p, m1, m2) {
+  p*dnorm(l,mean = m1) + (1-p)*dnorm(l, mean = m2)
+}
+```
+
+```{r}
+l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 5, m2=7))
+```
+
+## Sampling from our example distribution
+
+## Further work
 
 Before showing how to fit odin models to data, we'll focus for a bit on monty itself.  If you are anxious to get on and fit the model from @sec-data-filter, you might skip ahead to @sec-inference, where we resume this.
 
diff --git a/montyDSL.qmd b/montyDSL.qmd
new file mode 100644
index 0000000..6989e89
--- /dev/null
+++ b/montyDSL.qmd
@@ -0,0 +1,109 @@
+# The monty DSL
+
+The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
+
+`monty` includes a simple probabilistic domain-specific language (DSL) that is inspired by `stan` and [Statistical Rethinking](https://xcelab.net/rm/).  It is designed to make some tasks a bit easier, particularly when defining priors for your model.  We expect that this DSL is not sufficiently advanced to represent most interesting models but it may get more clever and flexible in the future.  In particular we do not expect the DSL to be useful in writing likelihood functions for comparison to data; we expect that if your model is simple enough for this you would be better off using `stan` or some similarly flexible system.
+
+```{r}
+library(monty)
+```
+
+## Some simple examples
+
+In chapter 4 of Statistical Rethinking, we build a regression model of height with parameters $\alpha$, $\beta$ and $\sigma$.  We can define the model for the prior probability of this model in monty by running
+
+```{r}
+prior <- monty_dsl({
+  alpha ~ Normal(178, 20)
+  beta ~ Normal(0, 10)
+  sigma ~ Uniform(0, 50)
+})
+```
+
+This will define a new `monty_model()` object that represents the prior, but with all the bits that we might need depending on how we want to use it:
+
+We have model parameters
+
+```{r}
+prior$parameters
+```
+
+These are defined in the order that they appear in your definition (so `alpha` is first and `sigma` is last)
+
+We can compute the domain for your model:
+
+```{r}
+prior$domain
+```
+
+We can draw samples from the model if we provide a `monty_rng` object
+
+```{r}
+rng <- monty_rng$new()
+theta <- prior$direct_sample(rng)
+theta
+```
+
+We can compute the (log) density at a point in parameter space
+
+```{r}
+prior$density(theta)
+```
+
+The computed properties for the model are:
+
+```{r}
+prior$properties
+```
+
+## Calculations in the DSL
+
+Sometimes it will be useful to perform calculations in the code; you can do this with assignments.  Most trivially, giving names to numbers may help make code more understandable:
+
+```{r}
+m <- monty_dsl({
+  mu <- 10
+  sd <- 2
+  a ~ Normal(mu, sd)
+})
+```
+
+You can also use this to do things like:
+
+```{r}
+m <- monty_dsl({
+  a ~ Normal(0, 1)
+  b ~ Normal(0, 1)
+  mu <- (a + b) / 2
+  c ~ Normal(mu, 1)
+})
+```
+
+Where `c` is drawn from a normal distribution with a mean that is the average of `a` and `b`.
+
+## Pass in fixed data
+
+You can also pass in a list of data with values that should be available in the DSL code.  For example, our first example:
+
+```{r}
+prior <- monty_dsl({
+  alpha ~ Normal(178, 20)
+  beta ~ Normal(0, 10)
+  sigma ~ Uniform(0, 50)
+})
+```
+
+Might be written as
+
+```{r}
+fixed <- list(alpha_mean = 170, alpha_sd = 20,
+              beta_mean = 0, beta_sd = 10,
+              sigma_max = 50)
+prior <- monty_dsl({
+  alpha ~ Normal(alpha_mean, alpha_sd)
+  beta ~ Normal(beta_mean, beta_sd)
+  sigma ~ Uniform(0, sigma_max)
+}, fixed = fixed)
+```
+
+Values you pass in this way are **fixed** (hence the name!) and cannot be modified after the model object is created.
\ No newline at end of file
diff --git a/samplers.qmd b/samplers.qmd
new file mode 100644
index 0000000..71c6c4f
--- /dev/null
+++ b/samplers.qmd
@@ -0,0 +1,96 @@
+# Samplers
+
+This vignette will describe the samplers available in monty.
+
+```{r}
+library(monty)
+```
+
+## The bendy banana
+
+This example shows HMC outperforming a random walk on a two dimensional banana-shaped function.  Our model takes two parameters `alpha` and `beta`, and is based on two successive simple draws, with the one conditional on the other, so $\beta \sim Normal(1,0)$ and $\alpha \sim Normal(\beta^2, \sigma)$, with $\sigma$ the standard deviation of the conditional draw.
+
+We include this example within the package; here we create a model with $\sigma = 0.5$
+
+```{r}
+m <- monty_example("banana", sigma = 0.5)
+m
+```
+
+We can plot a visualisation of its density by computing the density over a grid.  Normally this is not possible of course:
+
+```{r}
+a <- seq(-2, 6, length.out = 1000)
+b <- seq(-2, 2, length.out = 1000)
+z <- outer(a, b, function(alpha, beta) {
+  exp(monty_model_density(m, rbind(alpha, beta)))
+})
+image(a, b, z, xlab = "alpha", ylab = "beta")
+```
+
+In this particular case we can also easily generate samples, so we know what a good sampler will produce:
+
+```{r}
+rng <- monty_rng$new()
+s <- vapply(seq(200), function(x) m$direct_sample(rng), numeric(2))
+image(a, b, z, xlab = "alpha", ylab = "beta")
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
+```
+
+It is also possible to compute the 95% confidence interval of the distribution using the relationship between the standard bivariate normal distribution and the banana shaped distribution as defined above. We can check that roughly 10 samples (out of 200) are out of this 95% CI contour.
+
+```{r}
+theta <- seq(0, 2 * pi, length.out = 10000)
+z95 <- local({
+  sigma <- 0.5
+  r <- sqrt(qchisq(.95, df = 2))
+  x <- r * cos(theta)
+  y <- r * sin(theta)
+  cbind(x^2 + y * sigma, x)
+})
+image(a, b, z, xlab = "alpha", ylab = "beta")
+lines(z95[, 1], z95[, 2])
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
+```
+
+## Sampling with other samplers
+
+It is not generally possible to directly sample from a density (otherwise MCMC and similar methods would not exist!).  In these cases we need to use a sampler based on the density and if available possibly the gradient of the density.
+
+We can start with a basic random-walk sampler:
+
+```{r RW_sampling}
+sampler_rw <- monty_sampler_random_walk(vcv = diag(2) * 0.01)
+res_rw <- monty_sample(m, sampler_rw, 1000)
+plot(t(drop(res_rw$pars)), type = "l", col = "#0000ff66",
+     xlim = range(a), ylim = range(b))
+lines(z95[, 1], z95[, 2])
+```
+
+As we can see this is not great, exhibiting strong random walk behaviour as it slowly explores the surface (this is over 1,000 steps).  Another way to view this is the parameters varying over steps:
+
+```
+matplot(t(drop(res_rw$pars)), lty = 1, type = "l", col = c(2, 4),
+        xlab = "Step", ylab = "Value")
+```
+
+We can probably improve the samples here by finding a better variance covariance matrix (VCV), but a single VCV will not hold well over the whole surface because it is not very similar to a multivariate normal (that is, the appropriate VCV will change depending on the position in parameter space)
+
+Let's try the Hamiltonian Monte Carlo (HMC) sampler, which uses the gradient to move efficiently in parameter space:
+
+```{r HMC_sampling}
+sampler_hmc <- monty_sampler_hmc(epsilon = 0.1, n_integration_steps = 10)
+res_hmc <- monty_sample(m, sampler_hmc, 1000)
+plot(t(drop(res_hmc$pars)), type = "l", col = "#0000ff33",
+     xlim = range(a), ylim = range(b))
+lines(z95[, 1], z95[, 2])
+```
+
+or viewed over steps:
+
+```{r}
+matplot(t(drop(res_hmc$pars)), lty = 1, type = "l", col = c(2, 4),
+        xlab = "Step", ylab = "Value")
+```
+
+Clearly better!

From 94d487f3024bbcae6a3b866ffd29d93341177c7d Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 7 Nov 2024 08:43:38 +0000
Subject: [PATCH 02/34] More editing

---
 _quarto.yml        |  2 ++
 advanced_monty.qmd |  9 +++++++++
 composability.qmd  | 25 ++++++++++++++++++++++++-
 model.qmd          | 40 +++++++++++++++++++++++++++++++++++++++-
 monty.qmd          | 38 +++++++++++++++++++++++++++++++-------
 runners.qmd        |  3 +++
 samplers.qmd       |  2 +-
 7 files changed, 109 insertions(+), 10 deletions(-)
 create mode 100644 advanced_monty.qmd
 create mode 100644 runners.qmd

diff --git a/_quarto.yml b/_quarto.yml
index 4cef2c0..7631883 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -25,6 +25,8 @@ book:
       - montyDSL.qmd
       - composability.qmd
       - samplers.qmd
+      - runners.qmd
+      - advanced_monty.qmd
     - part: "Odin & Monty"
       chapters:
       - inference.qmd
diff --git a/advanced_monty.qmd b/advanced_monty.qmd
new file mode 100644
index 0000000..5dbb364
--- /dev/null
+++ b/advanced_monty.qmd
@@ -0,0 +1,9 @@
+# Advanced monty
+
+## Observers
+
+## Working with other packages
+
+Exporting chains to work with other packages
+
+## Nested models
\ No newline at end of file
diff --git a/composability.qmd b/composability.qmd
index 5b4977f..196c563 100644
--- a/composability.qmd
+++ b/composability.qmd
@@ -1 +1,24 @@
-# Composability
\ No newline at end of file
+# Composability
+
+```{r}
+library(monty)
+```
+
+```{r}
+# A simple example; a model that contains something of interest,
+# and a simple prior from monty_dsl
+likelihood <- monty_example("banana")
+prior <- monty_dsl({
+  alpha ~ Normal(0, 1)
+  beta ~ Normal(0, 10)
+})
+posterior <- likelihood + prior
+posterior
+
+# # The same thing, more explicitly:
+# monty_model_combine(likelihood, prior)
+# 
+# # Control properties of the combined model:
+# monty_model_combine(likelihood, prior,
+#                     monty_model_properties(has_gradient = FALSE))
+```
\ No newline at end of file
diff --git a/model.qmd b/model.qmd
index 38795a1..6f2c160 100644
--- a/model.qmd
+++ b/model.qmd
@@ -4,4 +4,42 @@ We often refers to using models.
 
 ## SimCity vs Statistical models
 
-## Normalised vs unormalised densities
\ No newline at end of file
+## Normalised vs unormalised densities
+
+In Bayesian inference, we often encounter probability densities that can either be normalised or unnormalised, and understanding the distinction between these is crucial for implementing Monte Carlo methods effectively. Normalisation is particularly important when interpreting probabilities and integrating over parameter spaces, a concept with parallels in physics, where distributions are sometimes normalised to ensure they represent meaningful physical quantities.
+
+### Unnormalised Density
+
+An **unnormalised density** represents a probability density function that is proportional to the target distribution but has not been scaled to integrate to 1 over the entire parameter space. In Bayesian inference, we are often interested in the posterior distribution of a parameter $\theta$ given data $y$. The posterior density is given by:
+
+$$p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}$$
+
+where:
+
+- $p(y | \theta))$ is the likelihood of the data given the parameter,
+- $p(\theta)$ is the prior distribution, and
+- $p(y) = \int p(y | \theta) p(\theta) \, d\theta \)$ is the marginal likelihood, or the evidence, which acts as a normalising constant.
+
+In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density \( p(y \mid \theta) p(\theta) \). The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of \( \theta \).
+
+### Normalised Density
+
+A **normalised density** ensures that the total probability over all possible parameter values is 1. Normalisation is achieved by dividing by the integral over the entire density, as shown above with \( p(y) \) in the denominator. This step is essential when absolute probability values are required or when we need to make direct probabilistic interpretations. 
+
+In Bayesian inference, although we often rely on unnormalised densities, there are cases—such as when comparing models using Bayes factors—where we require the fully normalised posterior.
+
+### Link to Physics: Partition Function and Probability Density
+
+The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** \( Z \) normalises the probability density over all microstates \( x \) of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state \( x \) with energy \( E(x) \) at temperature \( T \) is proportional to \( e^{-E(x)/kT} \), where \( k \) is Boltzmann's constant. The partition function \( Z = \sum_{x} e^{-E(x)/kT} \) or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
+
+In Monte Carlo algorithms, this parallel becomes especially useful. For example, in the **Metropolis-Hastings algorithm**, we only need the ratio of probabilities between two states. This mirrors physical systems where absolute energies are less important than energy differences between states, similar to unnormalised densities in Bayesian inference.
+
+### Monte Carlo Methods with Unnormalised Densities
+
+Monte Carlo methods like **Metropolis-Hastings** and **Importance Sampling** often operate with unnormalised densities, since only the relative probabilities affect the algorithm's behaviour. This allows us to bypass the need to calculate the normalising constant, which can be computationally expensive or intractable in high-dimensional spaces.
+
+In Importance Sampling, for example, we estimate expectations with respect to the target distribution by drawing samples from a simpler proposal distribution and weighting them according to the unnormalised target density. 
+
+
+
+## Properties of monty models
\ No newline at end of file
diff --git a/monty.qmd b/monty.qmd
index fdf6fc1..218d9e2 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,16 +1,13 @@
 # Getting started with monty
 
-* the philosophie
-* three options
-
-`monty` is a R package design to work with statistical models in a *modular* way. It uses Monte Carlo methods to facilitate drawing from these statistical models and is currently built to create complex models through four different routes:
+`monty` is a R package designed to work with statistical models in a modular way. It uses Monte Carlo methods to sample from these statistical models. It is currently built to create complex models through four different routes:
 
 * simple R code
 * a dedicated DSL
 * odin models
 * composing two monty models.
 
-The simple example in this chapter is built from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through composition of two monty models. monty models obtains from odin models are dealt in the odin & monty chapter.
+The simple example in this chapter comes from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through the composition of two monty models. monty models obtained from odin models are dealt in the odin & monty chapter.
 
 ## A simple example
 
@@ -24,16 +21,43 @@ For this we start by defining
 
 ```{r}
 fn <- function(l, p, m1, m2) {
-  p*dnorm(l,mean = m1) + (1-p)*dnorm(l, mean = m2)
+  log(p*dnorm(l,mean = m1) + (1-p)*dnorm(l, mean = m2))
 }
 ```
 
 ```{r}
-l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 5, m2=7))
+l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 3, m2=7))
+```
+
+We have just created a `monty` model. 
+
+```{r}
+l <- seq(from = 0, to = 10, by = 0.1)
+plot(l,
+     exp(Vectorize(l_distribution$density)(l)),
+     type = "l",
+     ylab = "density")
 ```
 
+
 ## Sampling from our example distribution
 
+
+```{r}
+sampler <- monty_sampler_random_walk(matrix(1))
+samples <- monty_sample(l_distribution, sampler, 2000, initial = 3,
+                        n_chains = 4)
+```
+
+```{r}
+matplot(samples$density, type = "l", lty = 1,
+        xlab = "log posterior density", ylab = "sample", col = "#00000055")
+```
+
+```{r}
+hist(samples$pars["l",,], breaks=100)
+```
+
 ## Further work
 
 Before showing how to fit odin models to data, we'll focus for a bit on monty itself.  If you are anxious to get on and fit the model from @sec-data-filter, you might skip ahead to @sec-inference, where we resume this.
diff --git a/runners.qmd b/runners.qmd
new file mode 100644
index 0000000..d25e818
--- /dev/null
+++ b/runners.qmd
@@ -0,0 +1,3 @@
+# Runners
+
+One of the focus of `monty` is tailoring implementation to optimise usage of available hardware while ensuring reproducible results accross platforms.
\ No newline at end of file
diff --git a/samplers.qmd b/samplers.qmd
index 71c6c4f..318fe05 100644
--- a/samplers.qmd
+++ b/samplers.qmd
@@ -1,4 +1,4 @@
-# Samplers
+# Samplers and inference
 
 This vignette will describe the samplers available in monty.
 

From ba7c859d25cf4c510c77479203c7fdac93da448a Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 7 Nov 2024 16:11:11 +0000
Subject: [PATCH 03/34] More editing

---
 2009fluUK.qmd     |  1 +
 _quarto.yml       |  1 +
 composability.qmd |  2 +-
 model.qmd         | 24 +++++++++++-------------
 monty.qmd         | 42 +++++++++++++++++++++++++++++-------------
 montyDSL.qmd      |  2 +-
 6 files changed, 44 insertions(+), 28 deletions(-)
 create mode 100644 2009fluUK.qmd

diff --git a/2009fluUK.qmd b/2009fluUK.qmd
new file mode 100644
index 0000000..1959388
--- /dev/null
+++ b/2009fluUK.qmd
@@ -0,0 +1 @@
+# 2009 pandemic in the UK
\ No newline at end of file
diff --git a/_quarto.yml b/_quarto.yml
index 7631883..f6ddf37 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -31,6 +31,7 @@ book:
       chapters:
       - inference.qmd
       - differentiability.qmd
+      - 2009fluUK.qmd
     - references.qmd
   appendices:
     - installation.qmd
diff --git a/composability.qmd b/composability.qmd
index 196c563..24450d4 100644
--- a/composability.qmd
+++ b/composability.qmd
@@ -1,4 +1,4 @@
-# Composability
+# Composability {#sec-monty-combine}
 
 ```{r}
 library(monty)
diff --git a/model.qmd b/model.qmd
index 6f2c160..457906d 100644
--- a/model.qmd
+++ b/model.qmd
@@ -1,26 +1,24 @@
-# About (monty) models
-
-We often refers to using models.
+# About monty models {#sec-monty-models}
 
 ## SimCity vs Statistical models
 
 ## Normalised vs unormalised densities
 
-In Bayesian inference, we often encounter probability densities that can either be normalised or unnormalised, and understanding the distinction between these is crucial for implementing Monte Carlo methods effectively. Normalisation is particularly important when interpreting probabilities and integrating over parameter spaces, a concept with parallels in physics, where distributions are sometimes normalised to ensure they represent meaningful physical quantities.
+In Bayesian inference, we often encounter probability densities that are unnormalised, and understanding the distinction between these is crucial for implementing Monte Carlo methods effectively. Normalisation is particularly important when interpreting probabilities and integrating over parameter spaces, a concept with parallels in physics, where distributions are sometimes normalised to ensure they represent meaningful physical quantities.
 
 ### Unnormalised Density
 
 An **unnormalised density** represents a probability density function that is proportional to the target distribution but has not been scaled to integrate to 1 over the entire parameter space. In Bayesian inference, we are often interested in the posterior distribution of a parameter $\theta$ given data $y$. The posterior density is given by:
-
-$$p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}$$
-
-where:
-
-- $p(y | \theta))$ is the likelihood of the data given the parameter,
+  
+  $$p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}$$
+  
+  where:
+  
+  - $p(y | \theta))$ is the likelihood of the data given the parameter,
 - $p(\theta)$ is the prior distribution, and
 - $p(y) = \int p(y | \theta) p(\theta) \, d\theta \)$ is the marginal likelihood, or the evidence, which acts as a normalising constant.
 
-In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density \( p(y \mid \theta) p(\theta) \). The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of \( \theta \).
+In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density \( p(y \mid \theta) p(\theta) \). The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of $\theta$.
 
 ### Normalised Density
 
@@ -30,7 +28,7 @@ In Bayesian inference, although we often rely on unnormalised densities, there a
 
 ### Link to Physics: Partition Function and Probability Density
 
-The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** \( Z \) normalises the probability density over all microstates \( x \) of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state \( x \) with energy \( E(x) \) at temperature \( T \) is proportional to \( e^{-E(x)/kT} \), where \( k \) is Boltzmann's constant. The partition function \( Z = \sum_{x} e^{-E(x)/kT} \) or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
+The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** $Z$ normalises the probability density over all microstates \( x \) of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state $x$ with energy $E(x)$ at temperature $T$ is proportional to $e^{-E(x)/kT}$, where \( k \) is Boltzmann's constant. The partition function \( Z = \sum_{x} e^{-E(x)/kT} \) or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
 
 In Monte Carlo algorithms, this parallel becomes especially useful. For example, in the **Metropolis-Hastings algorithm**, we only need the ratio of probabilities between two states. This mirrors physical systems where absolute energies are less important than energy differences between states, similar to unnormalised densities in Bayesian inference.
 
@@ -42,4 +40,4 @@ In Importance Sampling, for example, we estimate expectations with respect to th
 
 
 
-## Properties of monty models
\ No newline at end of file
+## Properties of monty models
diff --git a/monty.qmd b/monty.qmd
index 218d9e2..3276ffe 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,13 +1,13 @@
 # Getting started with monty
 
-`monty` is a R package designed to work with statistical models in a modular way. It uses Monte Carlo methods to sample from these statistical models. It is currently built to create complex models through four different routes:
+The `monty` R package is designed for modular work with statistical distributions, leveraging Monte Carlo methods for sampling. It enables the construction of increasingly complex models through four approaches:
 
-* simple R code
-* a dedicated DSL
-* odin models
-* composing two monty models.
+1. Simple R code,
+2. A dedicated domain-specific language (DSL),
+3. Integration with `odin` models,
+4. Composition of two `monty` models.
 
-The simple example in this chapter comes from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through the composition of two monty models. monty models obtained from odin models are dealt in the odin & monty chapter.
+The most basic method to define a `monty` model is through a simple R function, as shown in this chapter. More advanced examples are covered in the @sec-monty-models in particular in the context of Bayesian statistics. @sec-monty-dsl introduces the monty DSL for more versatile model building using a small probabilistic DSL similar to BUGs. @sec-monty-combine explains how to compose new monty models by combining two existing ones. The last part of this book starting from @sec-inference demonstrates how to incorporate `odin` models into `monty` workflows.
 
 ## A simple example
 
@@ -15,9 +15,9 @@ The simple example in this chapter comes from a simple R function and is the mos
 library(monty)
 ```
 
-We can define a simple Gaussian mixture model of two sub-population using the `monty_model_function()` from the package
+We can define a simple Gaussian [mixture model](https://en.wikipedia.org/wiki/Mixture_model) of two sub-populations using the `monty_model_function()` from the package. The model represents the distribution of a quantity $l$, sampled with a probability $p$ from a normal distribution of mean $m_{1}$ and with a probability $1-p$ from another normal distribution of mean $m_{2}$. Both subpopulations have variance equal to 1. 
 
-For this we start by defining 
+To build our `monty` model, we start by defining an R function returning the log-density of our statistical model. This function has for arguments $l$, the quantity of interest and the three parameters $p$, $m_{1}$ and $m_{1}$ defining the two subpopulations.
 
 ```{r}
 fn <- function(l, p, m1, m2) {
@@ -25,12 +25,21 @@ fn <- function(l, p, m1, m2) {
 }
 ```
 
+Assuming that the population is 75% with mean 3 and 25% with mean 7, we can build our `monty` model by indicating that the parameters of the subpopulations are fixed at these given values.
+
 ```{r}
-l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 3, m2=7))
+l_distribution <- monty_model_function(fn,
+                                       fixed = list(p = 0.75, m1 = 3, m2=7))
 ```
 
 We have just created a `monty` model. 
 
+```{r}
+l_distribution
+```
+
+We can plot the density of our model for values of $l$ between 0 and 10 and check that it returns the correct value.
+
 ```{r}
 l <- seq(from = 0, to = 10, by = 0.1)
 plot(l,
@@ -42,25 +51,32 @@ plot(l,
 
 ## Sampling from our example distribution
 
+We want now to sample from this model, using the `monty_sample()` function. For this we need to create a sampler object that will be used to explore our distribution. The most simple one is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently). It takes as an argument a variance-covariance (VCV) matrix that describes the random walk exploration of the sampler by jumping from the current point to the next one using a multivariate-normal draw defined with our covariance matrix. Here we have a single parameter $l$, we thus take matrix(1) as our VCV matrix.
 
 ```{r}
 sampler <- monty_sampler_random_walk(matrix(1))
-samples <- monty_sample(l_distribution, sampler, 2000, initial = 3,
-                        n_chains = 4)
 ```
 
+We are now ready to sample from our distribution using `monty_sample()` with 2000 samples, starting our MCMC chain at the value 3 and running 4 chains simultaneously.
+
+```{r}
+samples <- monty_sample(l_distribution, sampler, 2000, initial = 3, n_chains = 4)
+```
+
+We can visualise our 4 chains.
 ```{r}
 matplot(samples$density, type = "l", lty = 1,
         xlab = "log posterior density", ylab = "sample", col = "#00000055")
 ```
 
+We can also check that our samples are correctly representing our distribution:
 ```{r}
 hist(samples$pars["l",,], breaks=100)
 ```
 
-## Further work
+## Going further
 
-Before showing how to fit odin models to data, we'll focus for a bit on monty itself.  If you are anxious to get on and fit the model from @sec-data-filter, you might skip ahead to @sec-inference, where we resume this.
+We've discussed a little bit about the philosophy of `monty` and built a simple model using an R function with `monty_function`.
 
 There are a bunch of things we want to cover in this chapter so likely it will be split into a few:
 
diff --git a/montyDSL.qmd b/montyDSL.qmd
index 6989e89..3fd46a8 100644
--- a/montyDSL.qmd
+++ b/montyDSL.qmd
@@ -1,4 +1,4 @@
-# The monty DSL
+# The monty DSL {#sec-monty-dsl}
 
 The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
 

From 995024d5f7977b70eaeb5b28e3859f098afbba40 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Fri, 8 Nov 2024 07:39:20 +0000
Subject: [PATCH 04/34] Add code at beginning of pages to homogenise content
 and improve visual appearance of pages

---
 advanced_monty.qmd | 5 +++++
 composability.qmd  | 5 +++++
 model.qmd          | 5 +++++
 monty.qmd          | 5 +++++
 montyDSL.qmd       | 5 +++++
 runners.qmd        | 5 +++++
 samplers.qmd       | 5 +++++
 7 files changed, 35 insertions(+)

diff --git a/advanced_monty.qmd b/advanced_monty.qmd
index 5dbb364..9569314 100644
--- a/advanced_monty.qmd
+++ b/advanced_monty.qmd
@@ -1,5 +1,10 @@
 # Advanced monty
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 ## Observers
 
 ## Working with other packages
diff --git a/composability.qmd b/composability.qmd
index 24450d4..8fb2fbb 100644
--- a/composability.qmd
+++ b/composability.qmd
@@ -1,5 +1,10 @@
 # Composability {#sec-monty-combine}
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 ```{r}
 library(monty)
 ```
diff --git a/model.qmd b/model.qmd
index 457906d..4d7f718 100644
--- a/model.qmd
+++ b/model.qmd
@@ -1,5 +1,10 @@
 # About monty models {#sec-monty-models}
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 ## SimCity vs Statistical models
 
 ## Normalised vs unormalised densities
diff --git a/monty.qmd b/monty.qmd
index 3276ffe..cd1a1c8 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,5 +1,10 @@
 # Getting started with monty
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 The `monty` R package is designed for modular work with statistical distributions, leveraging Monte Carlo methods for sampling. It enables the construction of increasingly complex models through four approaches:
 
 1. Simple R code,
diff --git a/montyDSL.qmd b/montyDSL.qmd
index 3fd46a8..605afb2 100644
--- a/montyDSL.qmd
+++ b/montyDSL.qmd
@@ -1,5 +1,10 @@
 # The monty DSL {#sec-monty-dsl}
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
 
 `monty` includes a simple probabilistic domain-specific language (DSL) that is inspired by `stan` and [Statistical Rethinking](https://xcelab.net/rm/).  It is designed to make some tasks a bit easier, particularly when defining priors for your model.  We expect that this DSL is not sufficiently advanced to represent most interesting models but it may get more clever and flexible in the future.  In particular we do not expect the DSL to be useful in writing likelihood functions for comparison to data; we expect that if your model is simple enough for this you would be better off using `stan` or some similarly flexible system.
diff --git a/runners.qmd b/runners.qmd
index d25e818..358d464 100644
--- a/runners.qmd
+++ b/runners.qmd
@@ -1,3 +1,8 @@
 # Runners
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 One of the focus of `monty` is tailoring implementation to optimise usage of available hardware while ensuring reproducible results accross platforms.
\ No newline at end of file
diff --git a/samplers.qmd b/samplers.qmd
index 318fe05..5f21fd0 100644
--- a/samplers.qmd
+++ b/samplers.qmd
@@ -1,5 +1,10 @@
 # Samplers and inference
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 This vignette will describe the samplers available in monty.
 
 ```{r}

From 69c16110c2ddba5a4ae4e02f26783888b4846c5f Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Fri, 8 Nov 2024 09:11:57 +0000
Subject: [PATCH 05/34] Correct plot labelling

---
 monty.qmd | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/monty.qmd b/monty.qmd
index cd1a1c8..47fc7a1 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -53,7 +53,6 @@ plot(l,
      ylab = "density")
 ```
 
-
 ## Sampling from our example distribution
 
 We want now to sample from this model, using the `monty_sample()` function. For this we need to create a sampler object that will be used to explore our distribution. The most simple one is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently). It takes as an argument a variance-covariance (VCV) matrix that describes the random walk exploration of the sampler by jumping from the current point to the next one using a multivariate-normal draw defined with our covariance matrix. Here we have a single parameter $l$, we thus take matrix(1) as our VCV matrix.
@@ -71,7 +70,7 @@ samples <- monty_sample(l_distribution, sampler, 2000, initial = 3, n_chains = 4
 We can visualise our 4 chains.
 ```{r}
 matplot(samples$density, type = "l", lty = 1,
-        xlab = "log posterior density", ylab = "sample", col = "#00000055")
+        xlab = "sample", ylab = "log posterior density", col = "#00000055")
 ```
 
 We can also check that our samples are correctly representing our distribution:

From a7bebc3c5d03b57d09509725e375c93aeb2dfcad Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Mon, 4 Nov 2024 15:57:37 +0000
Subject: [PATCH 06/34] Starting the monty section

---
 _quarto.yml       |   4 ++
 composability.qmd |   1 +
 model.qmd         |   7 +++
 monty.qmd         |  38 +++++++++++++++-
 montyDSL.qmd      | 109 ++++++++++++++++++++++++++++++++++++++++++++++
 samplers.qmd      |  96 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 254 insertions(+), 1 deletion(-)
 create mode 100644 composability.qmd
 create mode 100644 model.qmd
 create mode 100644 montyDSL.qmd
 create mode 100644 samplers.qmd

diff --git a/_quarto.yml b/_quarto.yml
index 7b55dd7..10cc414 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -21,6 +21,10 @@ book:
     - part: "Monty"
       chapters:
       - monty.qmd
+      - model.qmd
+      - montyDSL.qmd
+      - composability.qmd
+      - samplers.qmd
     - part: "Odin & Monty"
       chapters:
       - inference.qmd
diff --git a/composability.qmd b/composability.qmd
new file mode 100644
index 0000000..5b4977f
--- /dev/null
+++ b/composability.qmd
@@ -0,0 +1 @@
+# Composability
\ No newline at end of file
diff --git a/model.qmd b/model.qmd
new file mode 100644
index 0000000..38795a1
--- /dev/null
+++ b/model.qmd
@@ -0,0 +1,7 @@
+# About (monty) models
+
+We often refers to using models.
+
+## SimCity vs Statistical models
+
+## Normalised vs unormalised densities
\ No newline at end of file
diff --git a/monty.qmd b/monty.qmd
index 8ddca01..fdf6fc1 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,4 +1,40 @@
-# Monty
+# Getting started with monty
+
+* the philosophie
+* three options
+
+`monty` is a R package design to work with statistical models in a *modular* way. It uses Monte Carlo methods to facilitate drawing from these statistical models and is currently built to create complex models through four different routes:
+
+* simple R code
+* a dedicated DSL
+* odin models
+* composing two monty models.
+
+The simple example in this chapter is built from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through composition of two monty models. monty models obtains from odin models are dealt in the odin & monty chapter.
+
+## A simple example
+
+```{r}
+library(monty)
+```
+
+We can define a simple Gaussian mixture model of two sub-population using the `monty_model_function()` from the package
+
+For this we start by defining 
+
+```{r}
+fn <- function(l, p, m1, m2) {
+  p*dnorm(l,mean = m1) + (1-p)*dnorm(l, mean = m2)
+}
+```
+
+```{r}
+l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 5, m2=7))
+```
+
+## Sampling from our example distribution
+
+## Further work
 
 Before showing how to fit odin models to data, we'll focus for a bit on monty itself.  If you are anxious to get on and fit the model from @sec-data-filter, you might skip ahead to @sec-inference, where we resume this.
 
diff --git a/montyDSL.qmd b/montyDSL.qmd
new file mode 100644
index 0000000..6989e89
--- /dev/null
+++ b/montyDSL.qmd
@@ -0,0 +1,109 @@
+# The monty DSL
+
+The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
+
+`monty` includes a simple probabilistic domain-specific language (DSL) that is inspired by `stan` and [Statistical Rethinking](https://xcelab.net/rm/).  It is designed to make some tasks a bit easier, particularly when defining priors for your model.  We expect that this DSL is not sufficiently advanced to represent most interesting models but it may get more clever and flexible in the future.  In particular we do not expect the DSL to be useful in writing likelihood functions for comparison to data; we expect that if your model is simple enough for this you would be better off using `stan` or some similarly flexible system.
+
+```{r}
+library(monty)
+```
+
+## Some simple examples
+
+In chapter 4 of Statistical Rethinking, we build a regression model of height with parameters $\alpha$, $\beta$ and $\sigma$.  We can define the model for the prior probability of this model in monty by running
+
+```{r}
+prior <- monty_dsl({
+  alpha ~ Normal(178, 20)
+  beta ~ Normal(0, 10)
+  sigma ~ Uniform(0, 50)
+})
+```
+
+This will define a new `monty_model()` object that represents the prior, but with all the bits that we might need depending on how we want to use it:
+
+We have model parameters
+
+```{r}
+prior$parameters
+```
+
+These are defined in the order that they appear in your definition (so `alpha` is first and `sigma` is last)
+
+We can compute the domain for your model:
+
+```{r}
+prior$domain
+```
+
+We can draw samples from the model if we provide a `monty_rng` object
+
+```{r}
+rng <- monty_rng$new()
+theta <- prior$direct_sample(rng)
+theta
+```
+
+We can compute the (log) density at a point in parameter space
+
+```{r}
+prior$density(theta)
+```
+
+The computed properties for the model are:
+
+```{r}
+prior$properties
+```
+
+## Calculations in the DSL
+
+Sometimes it will be useful to perform calculations in the code; you can do this with assignments.  Most trivially, giving names to numbers may help make code more understandable:
+
+```{r}
+m <- monty_dsl({
+  mu <- 10
+  sd <- 2
+  a ~ Normal(mu, sd)
+})
+```
+
+You can also use this to do things like:
+
+```{r}
+m <- monty_dsl({
+  a ~ Normal(0, 1)
+  b ~ Normal(0, 1)
+  mu <- (a + b) / 2
+  c ~ Normal(mu, 1)
+})
+```
+
+Where `c` is drawn from a normal distribution with a mean that is the average of `a` and `b`.
+
+## Pass in fixed data
+
+You can also pass in a list of data with values that should be available in the DSL code.  For example, our first example:
+
+```{r}
+prior <- monty_dsl({
+  alpha ~ Normal(178, 20)
+  beta ~ Normal(0, 10)
+  sigma ~ Uniform(0, 50)
+})
+```
+
+Might be written as
+
+```{r}
+fixed <- list(alpha_mean = 170, alpha_sd = 20,
+              beta_mean = 0, beta_sd = 10,
+              sigma_max = 50)
+prior <- monty_dsl({
+  alpha ~ Normal(alpha_mean, alpha_sd)
+  beta ~ Normal(beta_mean, beta_sd)
+  sigma ~ Uniform(0, sigma_max)
+}, fixed = fixed)
+```
+
+Values you pass in this way are **fixed** (hence the name!) and cannot be modified after the model object is created.
\ No newline at end of file
diff --git a/samplers.qmd b/samplers.qmd
new file mode 100644
index 0000000..71c6c4f
--- /dev/null
+++ b/samplers.qmd
@@ -0,0 +1,96 @@
+# Samplers
+
+This vignette will describe the samplers available in monty.
+
+```{r}
+library(monty)
+```
+
+## The bendy banana
+
+This example shows HMC outperforming a random walk on a two dimensional banana-shaped function.  Our model takes two parameters `alpha` and `beta`, and is based on two successive simple draws, with the one conditional on the other, so $\beta \sim Normal(1,0)$ and $\alpha \sim Normal(\beta^2, \sigma)$, with $\sigma$ the standard deviation of the conditional draw.
+
+We include this example within the package; here we create a model with $\sigma = 0.5$
+
+```{r}
+m <- monty_example("banana", sigma = 0.5)
+m
+```
+
+We can plot a visualisation of its density by computing the density over a grid.  Normally this is not possible of course:
+
+```{r}
+a <- seq(-2, 6, length.out = 1000)
+b <- seq(-2, 2, length.out = 1000)
+z <- outer(a, b, function(alpha, beta) {
+  exp(monty_model_density(m, rbind(alpha, beta)))
+})
+image(a, b, z, xlab = "alpha", ylab = "beta")
+```
+
+In this particular case we can also easily generate samples, so we know what a good sampler will produce:
+
+```{r}
+rng <- monty_rng$new()
+s <- vapply(seq(200), function(x) m$direct_sample(rng), numeric(2))
+image(a, b, z, xlab = "alpha", ylab = "beta")
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
+```
+
+It is also possible to compute the 95% confidence interval of the distribution using the relationship between the standard bivariate normal distribution and the banana shaped distribution as defined above. We can check that roughly 10 samples (out of 200) are out of this 95% CI contour.
+
+```{r}
+theta <- seq(0, 2 * pi, length.out = 10000)
+z95 <- local({
+  sigma <- 0.5
+  r <- sqrt(qchisq(.95, df = 2))
+  x <- r * cos(theta)
+  y <- r * sin(theta)
+  cbind(x^2 + y * sigma, x)
+})
+image(a, b, z, xlab = "alpha", ylab = "beta")
+lines(z95[, 1], z95[, 2])
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
+```
+
+## Sampling with other samplers
+
+It is not generally possible to directly sample from a density (otherwise MCMC and similar methods would not exist!).  In these cases we need to use a sampler based on the density and if available possibly the gradient of the density.
+
+We can start with a basic random-walk sampler:
+
+```{r RW_sampling}
+sampler_rw <- monty_sampler_random_walk(vcv = diag(2) * 0.01)
+res_rw <- monty_sample(m, sampler_rw, 1000)
+plot(t(drop(res_rw$pars)), type = "l", col = "#0000ff66",
+     xlim = range(a), ylim = range(b))
+lines(z95[, 1], z95[, 2])
+```
+
+As we can see this is not great, exhibiting strong random walk behaviour as it slowly explores the surface (this is over 1,000 steps).  Another way to view this is the parameters varying over steps:
+
+```
+matplot(t(drop(res_rw$pars)), lty = 1, type = "l", col = c(2, 4),
+        xlab = "Step", ylab = "Value")
+```
+
+We can probably improve the samples here by finding a better variance covariance matrix (VCV), but a single VCV will not hold well over the whole surface because it is not very similar to a multivariate normal (that is, the appropriate VCV will change depending on the position in parameter space)
+
+Let's try the Hamiltonian Monte Carlo (HMC) sampler, which uses the gradient to move efficiently in parameter space:
+
+```{r HMC_sampling}
+sampler_hmc <- monty_sampler_hmc(epsilon = 0.1, n_integration_steps = 10)
+res_hmc <- monty_sample(m, sampler_hmc, 1000)
+plot(t(drop(res_hmc$pars)), type = "l", col = "#0000ff33",
+     xlim = range(a), ylim = range(b))
+lines(z95[, 1], z95[, 2])
+```
+
+or viewed over steps:
+
+```{r}
+matplot(t(drop(res_hmc$pars)), lty = 1, type = "l", col = c(2, 4),
+        xlab = "Step", ylab = "Value")
+```
+
+Clearly better!

From a2c25f620e486718673d9b4df7dbf20775b734a2 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 7 Nov 2024 08:43:38 +0000
Subject: [PATCH 07/34] More editing

---
 _quarto.yml        |  2 ++
 advanced_monty.qmd |  9 +++++++++
 composability.qmd  | 25 ++++++++++++++++++++++++-
 model.qmd          | 40 +++++++++++++++++++++++++++++++++++++++-
 monty.qmd          | 38 +++++++++++++++++++++++++++++++-------
 runners.qmd        |  3 +++
 samplers.qmd       |  2 +-
 7 files changed, 109 insertions(+), 10 deletions(-)
 create mode 100644 advanced_monty.qmd
 create mode 100644 runners.qmd

diff --git a/_quarto.yml b/_quarto.yml
index 10cc414..d214d4e 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -25,6 +25,8 @@ book:
       - montyDSL.qmd
       - composability.qmd
       - samplers.qmd
+      - runners.qmd
+      - advanced_monty.qmd
     - part: "Odin & Monty"
       chapters:
       - inference.qmd
diff --git a/advanced_monty.qmd b/advanced_monty.qmd
new file mode 100644
index 0000000..5dbb364
--- /dev/null
+++ b/advanced_monty.qmd
@@ -0,0 +1,9 @@
+# Advanced monty
+
+## Observers
+
+## Working with other packages
+
+Exporting chains to work with other packages
+
+## Nested models
\ No newline at end of file
diff --git a/composability.qmd b/composability.qmd
index 5b4977f..196c563 100644
--- a/composability.qmd
+++ b/composability.qmd
@@ -1 +1,24 @@
-# Composability
\ No newline at end of file
+# Composability
+
+```{r}
+library(monty)
+```
+
+```{r}
+# A simple example; a model that contains something of interest,
+# and a simple prior from monty_dsl
+likelihood <- monty_example("banana")
+prior <- monty_dsl({
+  alpha ~ Normal(0, 1)
+  beta ~ Normal(0, 10)
+})
+posterior <- likelihood + prior
+posterior
+
+# # The same thing, more explicitly:
+# monty_model_combine(likelihood, prior)
+# 
+# # Control properties of the combined model:
+# monty_model_combine(likelihood, prior,
+#                     monty_model_properties(has_gradient = FALSE))
+```
\ No newline at end of file
diff --git a/model.qmd b/model.qmd
index 38795a1..6f2c160 100644
--- a/model.qmd
+++ b/model.qmd
@@ -4,4 +4,42 @@ We often refers to using models.
 
 ## SimCity vs Statistical models
 
-## Normalised vs unormalised densities
\ No newline at end of file
+## Normalised vs unormalised densities
+
+In Bayesian inference, we often encounter probability densities that can either be normalised or unnormalised, and understanding the distinction between these is crucial for implementing Monte Carlo methods effectively. Normalisation is particularly important when interpreting probabilities and integrating over parameter spaces, a concept with parallels in physics, where distributions are sometimes normalised to ensure they represent meaningful physical quantities.
+
+### Unnormalised Density
+
+An **unnormalised density** represents a probability density function that is proportional to the target distribution but has not been scaled to integrate to 1 over the entire parameter space. In Bayesian inference, we are often interested in the posterior distribution of a parameter $\theta$ given data $y$. The posterior density is given by:
+
+$$p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}$$
+
+where:
+
+- $p(y | \theta))$ is the likelihood of the data given the parameter,
+- $p(\theta)$ is the prior distribution, and
+- $p(y) = \int p(y | \theta) p(\theta) \, d\theta \)$ is the marginal likelihood, or the evidence, which acts as a normalising constant.
+
+In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density \( p(y \mid \theta) p(\theta) \). The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of \( \theta \).
+
+### Normalised Density
+
+A **normalised density** ensures that the total probability over all possible parameter values is 1. Normalisation is achieved by dividing by the integral over the entire density, as shown above with \( p(y) \) in the denominator. This step is essential when absolute probability values are required or when we need to make direct probabilistic interpretations. 
+
+In Bayesian inference, although we often rely on unnormalised densities, there are cases—such as when comparing models using Bayes factors—where we require the fully normalised posterior.
+
+### Link to Physics: Partition Function and Probability Density
+
+The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** \( Z \) normalises the probability density over all microstates \( x \) of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state \( x \) with energy \( E(x) \) at temperature \( T \) is proportional to \( e^{-E(x)/kT} \), where \( k \) is Boltzmann's constant. The partition function \( Z = \sum_{x} e^{-E(x)/kT} \) or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
+
+In Monte Carlo algorithms, this parallel becomes especially useful. For example, in the **Metropolis-Hastings algorithm**, we only need the ratio of probabilities between two states. This mirrors physical systems where absolute energies are less important than energy differences between states, similar to unnormalised densities in Bayesian inference.
+
+### Monte Carlo Methods with Unnormalised Densities
+
+Monte Carlo methods like **Metropolis-Hastings** and **Importance Sampling** often operate with unnormalised densities, since only the relative probabilities affect the algorithm's behaviour. This allows us to bypass the need to calculate the normalising constant, which can be computationally expensive or intractable in high-dimensional spaces.
+
+In Importance Sampling, for example, we estimate expectations with respect to the target distribution by drawing samples from a simpler proposal distribution and weighting them according to the unnormalised target density. 
+
+
+
+## Properties of monty models
\ No newline at end of file
diff --git a/monty.qmd b/monty.qmd
index fdf6fc1..218d9e2 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,16 +1,13 @@
 # Getting started with monty
 
-* the philosophie
-* three options
-
-`monty` is a R package design to work with statistical models in a *modular* way. It uses Monte Carlo methods to facilitate drawing from these statistical models and is currently built to create complex models through four different routes:
+`monty` is a R package designed to work with statistical models in a modular way. It uses Monte Carlo methods to sample from these statistical models. It is currently built to create complex models through four different routes:
 
 * simple R code
 * a dedicated DSL
 * odin models
 * composing two monty models.
 
-The simple example in this chapter is built from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through composition of two monty models. monty models obtains from odin models are dealt in the odin & monty chapter.
+The simple example in this chapter comes from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through the composition of two monty models. monty models obtained from odin models are dealt in the odin & monty chapter.
 
 ## A simple example
 
@@ -24,16 +21,43 @@ For this we start by defining
 
 ```{r}
 fn <- function(l, p, m1, m2) {
-  p*dnorm(l,mean = m1) + (1-p)*dnorm(l, mean = m2)
+  log(p*dnorm(l,mean = m1) + (1-p)*dnorm(l, mean = m2))
 }
 ```
 
 ```{r}
-l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 5, m2=7))
+l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 3, m2=7))
+```
+
+We have just created a `monty` model. 
+
+```{r}
+l <- seq(from = 0, to = 10, by = 0.1)
+plot(l,
+     exp(Vectorize(l_distribution$density)(l)),
+     type = "l",
+     ylab = "density")
 ```
 
+
 ## Sampling from our example distribution
 
+
+```{r}
+sampler <- monty_sampler_random_walk(matrix(1))
+samples <- monty_sample(l_distribution, sampler, 2000, initial = 3,
+                        n_chains = 4)
+```
+
+```{r}
+matplot(samples$density, type = "l", lty = 1,
+        xlab = "log posterior density", ylab = "sample", col = "#00000055")
+```
+
+```{r}
+hist(samples$pars["l",,], breaks=100)
+```
+
 ## Further work
 
 Before showing how to fit odin models to data, we'll focus for a bit on monty itself.  If you are anxious to get on and fit the model from @sec-data-filter, you might skip ahead to @sec-inference, where we resume this.
diff --git a/runners.qmd b/runners.qmd
new file mode 100644
index 0000000..d25e818
--- /dev/null
+++ b/runners.qmd
@@ -0,0 +1,3 @@
+# Runners
+
+One of the focus of `monty` is tailoring implementation to optimise usage of available hardware while ensuring reproducible results accross platforms.
\ No newline at end of file
diff --git a/samplers.qmd b/samplers.qmd
index 71c6c4f..318fe05 100644
--- a/samplers.qmd
+++ b/samplers.qmd
@@ -1,4 +1,4 @@
-# Samplers
+# Samplers and inference
 
 This vignette will describe the samplers available in monty.
 

From 44f526fcbbe341e46437c3b877af64ede139b173 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 7 Nov 2024 16:11:11 +0000
Subject: [PATCH 08/34] More editing

---
 2009fluUK.qmd     |  1 +
 _quarto.yml       |  1 +
 composability.qmd |  2 +-
 model.qmd         | 24 +++++++++++-------------
 monty.qmd         | 42 +++++++++++++++++++++++++++++-------------
 montyDSL.qmd      |  2 +-
 6 files changed, 44 insertions(+), 28 deletions(-)
 create mode 100644 2009fluUK.qmd

diff --git a/2009fluUK.qmd b/2009fluUK.qmd
new file mode 100644
index 0000000..1959388
--- /dev/null
+++ b/2009fluUK.qmd
@@ -0,0 +1 @@
+# 2009 pandemic in the UK
\ No newline at end of file
diff --git a/_quarto.yml b/_quarto.yml
index d214d4e..3e8ef1b 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -31,6 +31,7 @@ book:
       chapters:
       - inference.qmd
       - differentiability.qmd
+      - 2009fluUK.qmd
     - references.qmd
   appendices:
     - installation.qmd
diff --git a/composability.qmd b/composability.qmd
index 196c563..24450d4 100644
--- a/composability.qmd
+++ b/composability.qmd
@@ -1,4 +1,4 @@
-# Composability
+# Composability {#sec-monty-combine}
 
 ```{r}
 library(monty)
diff --git a/model.qmd b/model.qmd
index 6f2c160..457906d 100644
--- a/model.qmd
+++ b/model.qmd
@@ -1,26 +1,24 @@
-# About (monty) models
-
-We often refers to using models.
+# About monty models {#sec-monty-models}
 
 ## SimCity vs Statistical models
 
 ## Normalised vs unormalised densities
 
-In Bayesian inference, we often encounter probability densities that can either be normalised or unnormalised, and understanding the distinction between these is crucial for implementing Monte Carlo methods effectively. Normalisation is particularly important when interpreting probabilities and integrating over parameter spaces, a concept with parallels in physics, where distributions are sometimes normalised to ensure they represent meaningful physical quantities.
+In Bayesian inference, we often encounter probability densities that are unnormalised, and understanding the distinction between these is crucial for implementing Monte Carlo methods effectively. Normalisation is particularly important when interpreting probabilities and integrating over parameter spaces, a concept with parallels in physics, where distributions are sometimes normalised to ensure they represent meaningful physical quantities.
 
 ### Unnormalised Density
 
 An **unnormalised density** represents a probability density function that is proportional to the target distribution but has not been scaled to integrate to 1 over the entire parameter space. In Bayesian inference, we are often interested in the posterior distribution of a parameter $\theta$ given data $y$. The posterior density is given by:
-
-$$p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}$$
-
-where:
-
-- $p(y | \theta))$ is the likelihood of the data given the parameter,
+  
+  $$p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}$$
+  
+  where:
+  
+  - $p(y | \theta))$ is the likelihood of the data given the parameter,
 - $p(\theta)$ is the prior distribution, and
 - $p(y) = \int p(y | \theta) p(\theta) \, d\theta \)$ is the marginal likelihood, or the evidence, which acts as a normalising constant.
 
-In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density \( p(y \mid \theta) p(\theta) \). The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of \( \theta \).
+In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density \( p(y \mid \theta) p(\theta) \). The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of $\theta$.
 
 ### Normalised Density
 
@@ -30,7 +28,7 @@ In Bayesian inference, although we often rely on unnormalised densities, there a
 
 ### Link to Physics: Partition Function and Probability Density
 
-The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** \( Z \) normalises the probability density over all microstates \( x \) of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state \( x \) with energy \( E(x) \) at temperature \( T \) is proportional to \( e^{-E(x)/kT} \), where \( k \) is Boltzmann's constant. The partition function \( Z = \sum_{x} e^{-E(x)/kT} \) or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
+The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** $Z$ normalises the probability density over all microstates \( x \) of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state $x$ with energy $E(x)$ at temperature $T$ is proportional to $e^{-E(x)/kT}$, where \( k \) is Boltzmann's constant. The partition function \( Z = \sum_{x} e^{-E(x)/kT} \) or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
 
 In Monte Carlo algorithms, this parallel becomes especially useful. For example, in the **Metropolis-Hastings algorithm**, we only need the ratio of probabilities between two states. This mirrors physical systems where absolute energies are less important than energy differences between states, similar to unnormalised densities in Bayesian inference.
 
@@ -42,4 +40,4 @@ In Importance Sampling, for example, we estimate expectations with respect to th
 
 
 
-## Properties of monty models
\ No newline at end of file
+## Properties of monty models
diff --git a/monty.qmd b/monty.qmd
index 218d9e2..3276ffe 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,13 +1,13 @@
 # Getting started with monty
 
-`monty` is a R package designed to work with statistical models in a modular way. It uses Monte Carlo methods to sample from these statistical models. It is currently built to create complex models through four different routes:
+The `monty` R package is designed for modular work with statistical distributions, leveraging Monte Carlo methods for sampling. It enables the construction of increasingly complex models through four approaches:
 
-* simple R code
-* a dedicated DSL
-* odin models
-* composing two monty models.
+1. Simple R code,
+2. A dedicated domain-specific language (DSL),
+3. Integration with `odin` models,
+4. Composition of two `monty` models.
 
-The simple example in this chapter comes from a simple R function and is the most basic way of defining a monty model. We will cover more complex examples in the model section. Section ** deals with the monty DSL, while section ** explains how to obtain new monty models through the composition of two monty models. monty models obtained from odin models are dealt in the odin & monty chapter.
+The most basic method to define a `monty` model is through a simple R function, as shown in this chapter. More advanced examples are covered in the @sec-monty-models in particular in the context of Bayesian statistics. @sec-monty-dsl introduces the monty DSL for more versatile model building using a small probabilistic DSL similar to BUGs. @sec-monty-combine explains how to compose new monty models by combining two existing ones. The last part of this book starting from @sec-inference demonstrates how to incorporate `odin` models into `monty` workflows.
 
 ## A simple example
 
@@ -15,9 +15,9 @@ The simple example in this chapter comes from a simple R function and is the mos
 library(monty)
 ```
 
-We can define a simple Gaussian mixture model of two sub-population using the `monty_model_function()` from the package
+We can define a simple Gaussian [mixture model](https://en.wikipedia.org/wiki/Mixture_model) of two sub-populations using the `monty_model_function()` from the package. The model represents the distribution of a quantity $l$, sampled with a probability $p$ from a normal distribution of mean $m_{1}$ and with a probability $1-p$ from another normal distribution of mean $m_{2}$. Both subpopulations have variance equal to 1. 
 
-For this we start by defining 
+To build our `monty` model, we start by defining an R function returning the log-density of our statistical model. This function has for arguments $l$, the quantity of interest and the three parameters $p$, $m_{1}$ and $m_{1}$ defining the two subpopulations.
 
 ```{r}
 fn <- function(l, p, m1, m2) {
@@ -25,12 +25,21 @@ fn <- function(l, p, m1, m2) {
 }
 ```
 
+Assuming that the population is 75% with mean 3 and 25% with mean 7, we can build our `monty` model by indicating that the parameters of the subpopulations are fixed at these given values.
+
 ```{r}
-l_distribution <- monty_model_function(fn, fixed = list(p = 0.75, m1 = 3, m2=7))
+l_distribution <- monty_model_function(fn,
+                                       fixed = list(p = 0.75, m1 = 3, m2=7))
 ```
 
 We have just created a `monty` model. 
 
+```{r}
+l_distribution
+```
+
+We can plot the density of our model for values of $l$ between 0 and 10 and check that it returns the correct value.
+
 ```{r}
 l <- seq(from = 0, to = 10, by = 0.1)
 plot(l,
@@ -42,25 +51,32 @@ plot(l,
 
 ## Sampling from our example distribution
 
+We want now to sample from this model, using the `monty_sample()` function. For this we need to create a sampler object that will be used to explore our distribution. The most simple one is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently). It takes as an argument a variance-covariance (VCV) matrix that describes the random walk exploration of the sampler by jumping from the current point to the next one using a multivariate-normal draw defined with our covariance matrix. Here we have a single parameter $l$, we thus take matrix(1) as our VCV matrix.
 
 ```{r}
 sampler <- monty_sampler_random_walk(matrix(1))
-samples <- monty_sample(l_distribution, sampler, 2000, initial = 3,
-                        n_chains = 4)
 ```
 
+We are now ready to sample from our distribution using `monty_sample()` with 2000 samples, starting our MCMC chain at the value 3 and running 4 chains simultaneously.
+
+```{r}
+samples <- monty_sample(l_distribution, sampler, 2000, initial = 3, n_chains = 4)
+```
+
+We can visualise our 4 chains.
 ```{r}
 matplot(samples$density, type = "l", lty = 1,
         xlab = "log posterior density", ylab = "sample", col = "#00000055")
 ```
 
+We can also check that our samples are correctly representing our distribution:
 ```{r}
 hist(samples$pars["l",,], breaks=100)
 ```
 
-## Further work
+## Going further
 
-Before showing how to fit odin models to data, we'll focus for a bit on monty itself.  If you are anxious to get on and fit the model from @sec-data-filter, you might skip ahead to @sec-inference, where we resume this.
+We've discussed a little bit about the philosophy of `monty` and built a simple model using an R function with `monty_function`.
 
 There are a bunch of things we want to cover in this chapter so likely it will be split into a few:
 
diff --git a/montyDSL.qmd b/montyDSL.qmd
index 6989e89..3fd46a8 100644
--- a/montyDSL.qmd
+++ b/montyDSL.qmd
@@ -1,4 +1,4 @@
-# The monty DSL
+# The monty DSL {#sec-monty-dsl}
 
 The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
 

From 668518490c1de97043a290a36c69058827b6b23c Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Fri, 8 Nov 2024 07:39:20 +0000
Subject: [PATCH 09/34] Add code at beginning of pages to homogenise content
 and improve visual appearance of pages

---
 advanced_monty.qmd | 5 +++++
 composability.qmd  | 5 +++++
 model.qmd          | 5 +++++
 monty.qmd          | 5 +++++
 montyDSL.qmd       | 5 +++++
 runners.qmd        | 5 +++++
 samplers.qmd       | 5 +++++
 7 files changed, 35 insertions(+)

diff --git a/advanced_monty.qmd b/advanced_monty.qmd
index 5dbb364..9569314 100644
--- a/advanced_monty.qmd
+++ b/advanced_monty.qmd
@@ -1,5 +1,10 @@
 # Advanced monty
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 ## Observers
 
 ## Working with other packages
diff --git a/composability.qmd b/composability.qmd
index 24450d4..8fb2fbb 100644
--- a/composability.qmd
+++ b/composability.qmd
@@ -1,5 +1,10 @@
 # Composability {#sec-monty-combine}
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 ```{r}
 library(monty)
 ```
diff --git a/model.qmd b/model.qmd
index 457906d..4d7f718 100644
--- a/model.qmd
+++ b/model.qmd
@@ -1,5 +1,10 @@
 # About monty models {#sec-monty-models}
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 ## SimCity vs Statistical models
 
 ## Normalised vs unormalised densities
diff --git a/monty.qmd b/monty.qmd
index 3276ffe..cd1a1c8 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -1,5 +1,10 @@
 # Getting started with monty
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 The `monty` R package is designed for modular work with statistical distributions, leveraging Monte Carlo methods for sampling. It enables the construction of increasingly complex models through four approaches:
 
 1. Simple R code,
diff --git a/montyDSL.qmd b/montyDSL.qmd
index 3fd46a8..605afb2 100644
--- a/montyDSL.qmd
+++ b/montyDSL.qmd
@@ -1,5 +1,10 @@
 # The monty DSL {#sec-monty-dsl}
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
 
 `monty` includes a simple probabilistic domain-specific language (DSL) that is inspired by `stan` and [Statistical Rethinking](https://xcelab.net/rm/).  It is designed to make some tasks a bit easier, particularly when defining priors for your model.  We expect that this DSL is not sufficiently advanced to represent most interesting models but it may get more clever and flexible in the future.  In particular we do not expect the DSL to be useful in writing likelihood functions for comparison to data; we expect that if your model is simple enough for this you would be better off using `stan` or some similarly flexible system.
diff --git a/runners.qmd b/runners.qmd
index d25e818..358d464 100644
--- a/runners.qmd
+++ b/runners.qmd
@@ -1,3 +1,8 @@
 # Runners
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 One of the focus of `monty` is tailoring implementation to optimise usage of available hardware while ensuring reproducible results accross platforms.
\ No newline at end of file
diff --git a/samplers.qmd b/samplers.qmd
index 318fe05..5f21fd0 100644
--- a/samplers.qmd
+++ b/samplers.qmd
@@ -1,5 +1,10 @@
 # Samplers and inference
 
+```{r}
+#| include: false
+source("common.R")
+```
+
 This vignette will describe the samplers available in monty.
 
 ```{r}

From de186bdd5b5460c83f1e8ddcf26af0d6244269ed Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Fri, 8 Nov 2024 09:11:57 +0000
Subject: [PATCH 10/34] Correct plot labelling

---
 monty.qmd | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/monty.qmd b/monty.qmd
index cd1a1c8..47fc7a1 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -53,7 +53,6 @@ plot(l,
      ylab = "density")
 ```
 
-
 ## Sampling from our example distribution
 
 We want now to sample from this model, using the `monty_sample()` function. For this we need to create a sampler object that will be used to explore our distribution. The most simple one is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently). It takes as an argument a variance-covariance (VCV) matrix that describes the random walk exploration of the sampler by jumping from the current point to the next one using a multivariate-normal draw defined with our covariance matrix. Here we have a single parameter $l$, we thus take matrix(1) as our VCV matrix.
@@ -71,7 +70,7 @@ samples <- monty_sample(l_distribution, sampler, 2000, initial = 3, n_chains = 4
 We can visualise our 4 chains.
 ```{r}
 matplot(samples$density, type = "l", lty = 1,
-        xlab = "log posterior density", ylab = "sample", col = "#00000055")
+        xlab = "sample", ylab = "log posterior density", col = "#00000055")
 ```
 
 We can also check that our samples are correctly representing our distribution:

From 8048ac15f11be2d1c44a06961c23f944030de21e Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Tue, 12 Nov 2024 07:30:11 +0000
Subject: [PATCH 11/34] Expand text

---
 model.qmd    | 41 +++++++++++++++++++++++++++++++++++------
 monty.qmd    | 33 ++++++++++++++++-----------------
 montyDSL.qmd |  2 --
 samplers.qmd | 18 +++++++++++++++---
 4 files changed, 66 insertions(+), 28 deletions(-)

diff --git a/model.qmd b/model.qmd
index 4d7f718..bcb80b7 100644
--- a/model.qmd
+++ b/model.qmd
@@ -1,11 +1,40 @@
-# About monty models {#sec-monty-models}
+# About (monty) models {#sec-monty-models}
 
 ```{r}
 #| include: false
 source("common.R")
 ```
 
-## SimCity vs Statistical models
+In this book, we explore two fundamentally different types of models, **dynamical systems**-models that make snapshot of "reality" evolve over time and **statistical models** an abstract formalisation of uncertainty and random events. To explain the difference between the two we draw on the metaphor of the game *SimCity*.
+
+## Models vs models
+
+### SimCity vs Statistical Models
+
+Imagine a model of a city, like in the game *SimCity*, where you simulate the lives of thousands of virtual inhabitants, each following daily routines and interacting in a virtual world. In *SimCity*, you can track a variety of metrics - population growth, transportation patterns, pollution levels, and more. You can even introduce specific scenarios, like building a new hospital or responding to a disaster. These systems, defined by rules and parameters, evolve over time in response to preset parameters and the changes you make, much like dynamical models in real life. 
+
+In a **dynamical system**, we represent changes over time in a simplified version of reality. This model is built by defining a "state," which summarises the status of the system at a particular time point. For instance, in an epidemiological model, this "state" might include the number of susceptible, infected, and recovered individuals in a population. Additional details such as age groups, geographic regions, health risks, and symptoms may also be incorporated, depending on the level of complexity we want to capture.
+
+On the other hand, **statistical models** offer an abstract formalisation of uncertainty, by using the often represented by a density function over an "outcome" space. Statistical models describe uncertain events without necessarily defining how those events evolve over time.
+
+The `odin` package is tailored for building dynamical models, making it easy to design systems using differential and difference equations. In contrast, the `monty` package is designed for statistical modelling, providing tools for working with probabilistic distributions and Monte Carlo methods.
+
+## Parameters and Complexity
+
+The parameters for each type of model vary considerably. In a dynamical system like a *SimCity* simulation, you might have many parameters defining the setup. For example, a detailed epidemiological model could require fine-grained parameters for a specific scenario, such as the schedule of a vaccination campaign, age-specific transmission rates, or region-specific contact patterns. Each additional parameter refines the scenario, allowing you to examine how specific interventions might change the system’s evolution.
+
+In contrast, **statistical models** are typically more abstract and focused on quantifying uncertainty in outcomes rather than tracking specific changes over time. Here, parameters represent distributions over possible outcomes rather than scenario-specific inputs. Statistical models often aim to summarise patterns rather than simulate the day-to-day evolution of events.
+
+With the `odin` package, we can build and experiment with dynamical models, incorporating time-evolving parameters and exploring how changes in one part of the system affect the whole. Meanwhile, the `monty` package allows us to model uncertain outcomes probabilistically, with parameters that define densities rather than scenarios. Together, these packages enable both complex, evolving simulations and statistical analyses of uncertain events.
+
+### Linking the two world
+
+* stochastic systems
+* Bayesian statistics
+
+## Bayesian or not Bayesian?
+
+Many tools that inspired `monty` or that supports similar functions -such as the BUGS language, `stan`, `mcstate`, `BayesTools`, `drjacoby`, and *Statistical Rethinking* - are rooted in the Bayesian framework. However, we have designed `monty` to be **Bayesian-agnostic**. While a `monty` model requires a density function, it does not impose a Bayesian interpretation. This flexibility allows users to work with probabilistic models in both Bayesian and non-Bayesian contexts, depending on their needs.
 
 ## Normalised vs unormalised densities
 
@@ -21,19 +50,19 @@ An **unnormalised density** represents a probability density function that is pr
   
   - $p(y | \theta))$ is the likelihood of the data given the parameter,
 - $p(\theta)$ is the prior distribution, and
-- $p(y) = \int p(y | \theta) p(\theta) \, d\theta \)$ is the marginal likelihood, or the evidence, which acts as a normalising constant.
+- $p(y) = \int p(y | \theta) p(\theta) \, d\theta $ is the marginal likelihood, or the evidence, which acts as a normalising constant.
 
-In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density \( p(y \mid \theta) p(\theta) \). The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of $\theta$.
+In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density $p(y | \theta) p(\theta)$. The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of $\theta$.
 
 ### Normalised Density
 
-A **normalised density** ensures that the total probability over all possible parameter values is 1. Normalisation is achieved by dividing by the integral over the entire density, as shown above with \( p(y) \) in the denominator. This step is essential when absolute probability values are required or when we need to make direct probabilistic interpretations. 
+A **normalised density** ensures that the total probability over all possible parameter values is 1. Normalisation is achieved by dividing by the integral over the entire density, as shown above with $p(y)$ in the denominator. This step is essential when absolute probability values are required or when we need to make direct probabilistic interpretations. 
 
 In Bayesian inference, although we often rely on unnormalised densities, there are cases—such as when comparing models using Bayes factors—where we require the fully normalised posterior.
 
 ### Link to Physics: Partition Function and Probability Density
 
-The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** $Z$ normalises the probability density over all microstates \( x \) of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state $x$ with energy $E(x)$ at temperature $T$ is proportional to $e^{-E(x)/kT}$, where \( k \) is Boltzmann's constant. The partition function \( Z = \sum_{x} e^{-E(x)/kT} \) or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
+The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** $Z$ normalises the probability density over all microstates $x$ of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state $x$ with energy $E(x)$ at temperature $T$ is proportional to $e^{-E(x)/kT}$, where $k$ is Boltzmann's constant. The partition function $Z = \sum_{x} e^{-E(x)/kT}$ or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
 
 In Monte Carlo algorithms, this parallel becomes especially useful. For example, in the **Metropolis-Hastings algorithm**, we only need the ratio of probabilities between two states. This mirrors physical systems where absolute energies are less important than energy differences between states, similar to unnormalised densities in Bayesian inference.
 
diff --git a/monty.qmd b/monty.qmd
index 47fc7a1..1cd0e60 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -9,7 +9,7 @@ The `monty` R package is designed for modular work with statistical distribution
 
 1. Simple R code,
 2. A dedicated domain-specific language (DSL),
-3. Integration with `odin` models,
+3. Integration of `odin` models,
 4. Composition of two `monty` models.
 
 The most basic method to define a `monty` model is through a simple R function, as shown in this chapter. More advanced examples are covered in the @sec-monty-models in particular in the context of Bayesian statistics. @sec-monty-dsl introduces the monty DSL for more versatile model building using a small probabilistic DSL similar to BUGs. @sec-monty-combine explains how to compose new monty models by combining two existing ones. The last part of this book starting from @sec-inference demonstrates how to incorporate `odin` models into `monty` workflows.
@@ -22,7 +22,7 @@ library(monty)
 
 We can define a simple Gaussian [mixture model](https://en.wikipedia.org/wiki/Mixture_model) of two sub-populations using the `monty_model_function()` from the package. The model represents the distribution of a quantity $l$, sampled with a probability $p$ from a normal distribution of mean $m_{1}$ and with a probability $1-p$ from another normal distribution of mean $m_{2}$. Both subpopulations have variance equal to 1. 
 
-To build our `monty` model, we start by defining an R function returning the log-density of our statistical model. This function has for arguments $l$, the quantity of interest and the three parameters $p$, $m_{1}$ and $m_{1}$ defining the two subpopulations.
+To build our `monty` model, we start by defining an R function returning the log-density of our statistical model. This function has for arguments $l$, the quantity of interest and the three parameters $p$, $m_{1}$ and $m_{1}$ defining the two Normally-distributed subpopulations.
 
 ```{r}
 fn <- function(l, p, m1, m2) {
@@ -30,7 +30,7 @@ fn <- function(l, p, m1, m2) {
 }
 ```
 
-Assuming that the population is 75% with mean 3 and 25% with mean 7, we can build our `monty` model by indicating that the parameters of the subpopulations are fixed at these given values.
+Assuming that the population is 75% from subpopulation 1 (normally distributed with mean 3) and 25% from subpopulation 2 (also normally distributed but with mean 7), we can build our `monty` model by indicating that the parameters of the subpopulations are fixed at these given values.
 
 ```{r}
 l_distribution <- monty_model_function(fn,
@@ -43,7 +43,7 @@ We have just created a `monty` model.
 l_distribution
 ```
 
-We can plot the density of our model for values of $l$ between 0 and 10 and check that it returns the correct value.
+We can plot the density of our model for values of $l$ between 0 and 10 and eye-check that we can see the two subpopulations in the correct proportions and with the 3 and 7 as modes.
 
 ```{r}
 l <- seq(from = 0, to = 10, by = 0.1)
@@ -55,7 +55,11 @@ plot(l,
 
 ## Sampling from our example distribution
 
-We want now to sample from this model, using the `monty_sample()` function. For this we need to create a sampler object that will be used to explore our distribution. The most simple one is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently). It takes as an argument a variance-covariance (VCV) matrix that describes the random walk exploration of the sampler by jumping from the current point to the next one using a multivariate-normal draw defined with our covariance matrix. Here we have a single parameter $l$, we thus take matrix(1) as our VCV matrix.
+We want now to sample from this model, using the `monty_sample()` function. For this we need to tell `monty` which sampler we want to use to explore our distribution. There are a variety of samplers avalaible and you can learn about them in @sec-samplers. One of the simplest is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently) in most cases.
+
+The random walk sampler uses a variance-covariance (VCV) matrix to guide its exploration, determining the 'jump' from the current point to the next in a random walk by drawing from a multivariate normal distribution parameterised by this matrix. For a single-parameter model, we use a 1x1 matrix (`matrix(1)`) as our VCV matrix.
+
+The choice of the VCV matrix is critical for the sampler's efficiency, especially in more complex cases where the tuning of this matrix can significantly affect performance. A well-chosen VCV matrix optimises moving accross the parameter space, making the random walk sampler more effective in exploring the distribution of interest.
 
 ```{r}
 sampler <- monty_sampler_random_walk(matrix(1))
@@ -78,19 +82,14 @@ We can also check that our samples are correctly representing our distribution:
 hist(samples$pars["l",,], breaks=100)
 ```
 
-## Going further
+## Connection of `monty` with other software
 
-We've discussed a little bit about the philosophy of `monty` and built a simple model using an R function with `monty_function`.
+`monty` includes a simple probabilistic domain-specific language (DSL) that is inspired by languages of the BUGS family such as `stan` and [Statistical Rethinking](https://xcelab.net/rm/). It is designed to make some tasks a bit easier, particularly when defining priors for your model.  We expect that this DSL is not sufficiently advanced to represent most interesting models but it may get more clever and flexible in the future. In particular we do not expect the DSL to be useful in writing likelihood functions for comparison to data; we expect that if your model is simple enough for this you would be better off using `stan` or some similarly flexible system.
 
-There are a bunch of things we want to cover in this chapter so likely it will be split into a few:
+*mention SSM, dr jacoby, mcstate, BayesTools
+
+## Going further
 
-* defining a model in monty, both easily with `monty_function` and less easily with `monty_model`
-* fitting models with MCMC
-* writing priors with the `monty_dsl`
-* using observers to follow along with the model (if we can come up with a good example)
-* working with nested models (once we write the code that would allow this)
-* running MCMC chains in parallel
-* exporting chains to work with other packages
-* different samplers
+We've discussed a little bit about the philosophy of `monty` and built a simple model using an R function with `monty_function`. The example introduces the basics of defining and sampling from a `monty` model.
 
-Hmm, there really is quite a bit of ground to cover here!
+For more advanced applications, refer to @sec-monty-dsl for constructing models using `monty`'s domain-specific language, which enables greater flexibility for complex structures. To combine multiple models, see @sec-monty-combine, which demonstrates how to compose and integrate existing `monty` models. Lastly, @sec-inference illustrates how to incorporate `odin` models into `monty` workflows, expanding the package’s potential for Bayesian inference with more sophisticated, system-based models. Each section provides detailed examples to guide you in leveraging `monty`’s full capabilities.
diff --git a/montyDSL.qmd b/montyDSL.qmd
index 605afb2..e32883a 100644
--- a/montyDSL.qmd
+++ b/montyDSL.qmd
@@ -7,8 +7,6 @@ source("common.R")
 
 The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
 
-`monty` includes a simple probabilistic domain-specific language (DSL) that is inspired by `stan` and [Statistical Rethinking](https://xcelab.net/rm/).  It is designed to make some tasks a bit easier, particularly when defining priors for your model.  We expect that this DSL is not sufficiently advanced to represent most interesting models but it may get more clever and flexible in the future.  In particular we do not expect the DSL to be useful in writing likelihood functions for comparison to data; we expect that if your model is simple enough for this you would be better off using `stan` or some similarly flexible system.
-
 ```{r}
 library(monty)
 ```
diff --git a/samplers.qmd b/samplers.qmd
index 5f21fd0..f980835 100644
--- a/samplers.qmd
+++ b/samplers.qmd
@@ -1,17 +1,29 @@
-# Samplers and inference
+# Samplers and inference {#sec-samplers}
 
 ```{r}
 #| include: false
 source("common.R")
 ```
 
-This vignette will describe the samplers available in monty.
+Many quantities of interest in uncertain systems can be expressed as integrals that weight outcomes according to their probability. A common approach to computing these integrals is through Monte Carlo estimation, where we draw samples from our distribution of interest and take their mean as an approximation. This method, known as a 'Monte Carlo' estimate, has been central to probabilistic inference and has driven the development of sophisticated sampling algorithms since the advent of modern computing in the 1950s.
+
+The `monty` package offers a range of sampling algorithms tailored to different types of distributions and varying levels of available information. This flexibility allows users to select the most efficient sampler based on their distribution's characteristics and computational resources.
+
+## Samplers supported by `monty`
 
 ```{r}
 library(monty)
 ```
 
-## The bendy banana
+### Randow-Walk Metropolis-Hastings
+
+### Adaptive Random Walk
+
+### Nested models
+
+### Hamiltonian Monte Carlo
+
+## A simple example: The bendy banana
 
 This example shows HMC outperforming a random walk on a two dimensional banana-shaped function.  Our model takes two parameters `alpha` and `beta`, and is based on two successive simple draws, with the one conditional on the other, so $\beta \sim Normal(1,0)$ and $\alpha \sim Normal(\beta^2, \sigma)$, with $\sigma$ the standard deviation of the conditional draw.
 

From 6343ebfb3099b22b689fbd05cafb499b1dea4e78 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Tue, 12 Nov 2024 17:39:35 +0000
Subject: [PATCH 12/34] Restructuring and adding content + placeholders

---
 model.qmd | 115 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 76 insertions(+), 39 deletions(-)

diff --git a/model.qmd b/model.qmd
index bcb80b7..77ef2a1 100644
--- a/model.qmd
+++ b/model.qmd
@@ -1,77 +1,114 @@
-# About (monty) models {#sec-monty-models}
+# About (monty) Models {#sec-monty-models}
 
 ```{r}
 #| include: false
 source("common.R")
 ```
 
-In this book, we explore two fundamentally different types of models, **dynamical systems**-models that make snapshot of "reality" evolve over time and **statistical models** an abstract formalisation of uncertainty and random events. To explain the difference between the two we draw on the metaphor of the game *SimCity*.
+In this book, we explore two fundamentally different yet interconnected approaches to modelling: **dynamical systems** and **statistical models**. To illustrate these differences, let’s begin with a familiar (but maybe dated!) metaphor: the game [*SimCity*](https://en.wikipedia.org/wiki/SimCity).
 
-## Models vs models
+## Models vs. Models: dynamical and statistical perspectives
 
-### SimCity vs Statistical Models
+### SimCity and dynamical systems
 
-Imagine a model of a city, like in the game *SimCity*, where you simulate the lives of thousands of virtual inhabitants, each following daily routines and interacting in a virtual world. In *SimCity*, you can track a variety of metrics - population growth, transportation patterns, pollution levels, and more. You can even introduce specific scenarios, like building a new hospital or responding to a disaster. These systems, defined by rules and parameters, evolve over time in response to preset parameters and the changes you make, much like dynamical models in real life. 
+Imagine a city model, as in *SimCity*, where thousands of virtual inhabitants follow routines, interact, and respond to changes like new buildings or natural disasters. These elements evolve according to predefined rules and parameters, similar to how **dynamical systems** simulate real-world processes over time.
 
-In a **dynamical system**, we represent changes over time in a simplified version of reality. This model is built by defining a "state," which summarises the status of the system at a particular time point. For instance, in an epidemiological model, this "state" might include the number of susceptible, infected, and recovered individuals in a population. Additional details such as age groups, geographic regions, health risks, and symptoms may also be incorporated, depending on the level of complexity we want to capture.
+In a [**dynamical model**](https://en.wikipedia.org/wiki/Dynamical_system), we track changes in a system's "state" over time. The state is a summary of key information at a specific time point; for example, in an epidemiological model, the state might include the numbers of susceptible, infected, and recovered individuals. More detailed models may add variables like age, location, health risks, and symptoms, allowing us to simulate interventions, such as vaccination campaigns, and explore their potential impacts. There exist many formalisms to describe dynamical systems, incorporating e.g. specific input and output functions in control theory but the central element of these systems is this notion of the "state", a mathematical object summarising the system at a timepoint. 
 
-On the other hand, **statistical models** offer an abstract formalisation of uncertainty, by using the often represented by a density function over an "outcome" space. Statistical models describe uncertain events without necessarily defining how those events evolve over time.
+The `odin` package, which supports differential and difference equations, is an effective tool for building dynamical models, enabling users to simulate time-evolving systems. Dynamical models are well-suited for exploring how specific scenarios or interventions impact a system over time, making them particularly useful for modelling real-world phenomena in a structured way.
 
-The `odin` package is tailored for building dynamical models, making it easy to design systems using differential and difference equations. In contrast, the `monty` package is designed for statistical modelling, providing tools for working with probabilistic distributions and Monte Carlo methods.
+TODO: simple odin deterministic model with discussion about results being a time indexed suite of "states"
 
-## Parameters and Complexity
+### Statistical models and uncertainty
 
-The parameters for each type of model vary considerably. In a dynamical system like a *SimCity* simulation, you might have many parameters defining the setup. For example, a detailed epidemiological model could require fine-grained parameters for a specific scenario, such as the schedule of a vaccination campaign, age-specific transmission rates, or region-specific contact patterns. Each additional parameter refines the scenario, allowing you to examine how specific interventions might change the system’s evolution.
+[**Statistical models**](https://en.wikipedia.org/wiki/Statistical_model), on the other hand, focus on capturing uncertainty in outcomes. Instead of simulating the system’s evolution over time, they represent possible states of the world probabilistically. Statistical models summarise data patterns or help make predictions based on uncertainties, such as estimating the range of future cases of a disease or identifying risk factors in a population.
 
-In contrast, **statistical models** are typically more abstract and focused on quantifying uncertainty in outcomes rather than tracking specific changes over time. Here, parameters represent distributions over possible outcomes rather than scenario-specific inputs. Statistical models often aim to summarise patterns rather than simulate the day-to-day evolution of events.
+The `monty` package facilitates statistical modelling, providing tools for working with probability distributions and Monte Carlo methods. Where `odin` enables us to explore the dynamics of changing systems, `monty` is ideal for making probabilistic inferences and assessing variability across potential outcomes without necessarily modelling time-evolving dynamics.
 
-With the `odin` package, we can build and experiment with dynamical models, incorporating time-evolving parameters and exploring how changes in one part of the system affect the whole. Meanwhile, the `monty` package allows us to model uncertain outcomes probabilistically, with parameters that define densities rather than scenarios. Together, these packages enable both complex, evolving simulations and statistical analyses of uncertain events.
+TODO: simple gaussian model - discussion about parameter being actually the sampling space and explain that it connects with Bayesian inference, show that the important component of the model is the parameters and domain support + density function, maybe shows that the model can be built in different way (R function, monty_model(), DSL).
 
-### Linking the two world
+## Bridging dynamical systems and statistical models
 
-* stochastic systems
-* Bayesian statistics
+While dynamical systems and statistical models serve distinct purposes, they are not strictly separate. In fact, they can be connected through two powerful approaches that bring probabilistic reasoning into dynamical systems: **stochastic processes** and **Bayesian inference**.
 
-## Bayesian or not Bayesian?
+### Stochastic processes: adding uncertainty to dynamical models
 
-Many tools that inspired `monty` or that supports similar functions -such as the BUGS language, `stan`, `mcstate`, `BayesTools`, `drjacoby`, and *Statistical Rethinking* - are rooted in the Bayesian framework. However, we have designed `monty` to be **Bayesian-agnostic**. While a `monty` model requires a density function, it does not impose a Bayesian interpretation. This flexibility allows users to work with probabilistic models in both Bayesian and non-Bayesian contexts, depending on their needs.
+A natural way to bridge dynamical and statistical models is by introducing [**stochastic processes**](https://en.wikipedia.org/wiki/Stochastic_process). Traditional dynamical models use fixed parameters and deterministic rules to evolve a system’s state over time. However, many real-world systems have inherent randomness or uncertainty that deterministic models cannot capture.
 
-## Normalised vs unormalised densities
+In a **stochastic process**, the system’s state is no longer a single deterministic value but a collection of potential outcomes, each weighted by a probability. This probabilistic view enables us to model fluctuations and uncertainties within the dynamical framework, capturing both the system’s evolution and the uncertainty in each state. Stochastic processes are thus a natural extension of dynamical models, adding an extra layer of realism by treating system evolution as inherently uncertain.
 
-In Bayesian inference, we often encounter probability densities that are unnormalised, and understanding the distinction between these is crucial for implementing Monte Carlo methods effectively. Normalisation is particularly important when interpreting probabilities and integrating over parameter spaces, a concept with parallels in physics, where distributions are sometimes normalised to ensure they represent meaningful physical quantities.
+The `odin` package provides an intuitive framework for writing a class of stochastic systems, making it easier to define models that incorporate randomness in their evolution over time. The `dust` package complements `odin` by enabling efficient large-scale simulations, allowing users to capture and analyse the uncertainty inherent to these systems through repeated runs. Together, `odin` and `dust` offer a powerful toolkit for developing and exploring stochastic models that reflect the complexity and variability of real-world dynamics. 
 
-### Unnormalised Density
+TODO simple dust example or just link with relevant dust section in the book
 
-An **unnormalised density** represents a probability density function that is proportional to the target distribution but has not been scaled to integrate to 1 over the entire parameter space. In Bayesian inference, we are often interested in the posterior distribution of a parameter $\theta$ given data $y$. The posterior density is given by:
-  
-  $$p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}$$
-  
-  where:
+### Bayesian inference: statistical modelling of model parameters
+
+**Bayesian inference** is another approach to linking dynamical and statistical models by treating model parameters as random variables rather than fixed values. This introduces a probability distribution over possible parameter values, making parameter estimation a statistical problem.
+
+Using Bayes’ theorem, Bayesian inference combines:
+
+  - **The likelihood**: the probability of observing the data given specific parameter values, and
+  - **The prior distribution**: our initial assumptions about the parameter values before observing the data.
   
-  - $p(y | \theta))$ is the likelihood of the data given the parameter,
-- $p(\theta)$ is the prior distribution, and
-- $p(y) = \int p(y | \theta) p(\theta) \, d\theta $ is the marginal likelihood, or the evidence, which acts as a normalising constant.
+The result is the **posterior distribution** of a parameter $\theta$ given data $y$:
+
+$$
+p(\theta | y) = \frac{p(y|\theta) p(\theta)}{p(y)}
+$$
+
+where:
+
+  - $p(y|\theta)$ is the likelihood,
+  - $p(\theta)$ is the prior distribution, and
+  - $p(y)$ is a normalising constant to ensure the posterior integrates to 1 over all $\theta$ values.
+
+Bayesian inference updates our beliefs about parameters based on observed data, yielding a posterior distribution that reflects both model assumptions and data influence. This probabilistic framework around a dynamical system’s parameters creates a flexible statistical model that adapts as new data becomes available.
+
+In this way, **stochastic processes** add an uncertainty dimension to the state of dynamical systems, while **Bayesian inference** applies probabilistic modelling to the parameters, merging dynamical and statistical perspectives. Together, these methods provide a rich toolkit for modelling complex systems that evolve over time and accounting for both randomness and uncertainty.
 
-In many cases, computing $p(y)$ is challenging or even intractable, so we work with the unnormalised posterior density $p(y | \theta) p(\theta)$. The unnormalised density is sufficient for methods like **Markov Chain Monte Carlo (MCMC)**, where the absolute probability values are less relevant than their relative proportions across different values of $\theta$.
+## Parameters and model complexity
 
-### Normalised Density
+### Parameters in dynamical systems
 
-A **normalised density** ensures that the total probability over all possible parameter values is 1. Normalisation is achieved by dividing by the integral over the entire density, as shown above with $p(y)$ in the denominator. This step is essential when absolute probability values are required or when we need to make direct probabilistic interpretations. 
+In dynamical models, parameters define the structure and dynamics of the scenario. For example, a detailed epidemiological model may need parameters for transmission rates, contact patterns, or intervention schedules. These inputs allow us to examine "what-if" scenarios, enabling decision-makers to predict and manage changes in real-time.
 
-In Bayesian inference, although we often rely on unnormalised densities, there are cases—such as when comparing models using Bayes factors—where we require the fully normalised posterior.
+### Parameters in statistical models
 
-### Link to Physics: Partition Function and Probability Density
+Statistical models, meanwhile, use parameters to define probability distributions over possible outcomes rather than scenario-specific details. Here, parameters describe uncertainties, often quantifying risks, making predictions, or summarising patterns in data.
 
-The concept of normalisation has parallels in statistical mechanics. In physics, the **partition function** $Z$ normalises the probability density over all microstates $x$ of a system, enabling calculations of thermodynamic properties. Here, the probability density for a state $x$ with energy $E(x)$ at temperature $T$ is proportional to $e^{-E(x)/kT}$, where $k$ is Boltzmann's constant. The partition function $Z = \sum_{x} e^{-E(x)/kT}$ or its integral form over a continuous state space normalises this density, ensuring probabilities are meaningful when summing over all possible states.
+The `odin` package allows us to build and explore the behaviours of dynamical models over time, while `monty` enables us to quantify uncertainties and make probabilistic inferences without simulating detailed dynamics.
 
-In Monte Carlo algorithms, this parallel becomes especially useful. For example, in the **Metropolis-Hastings algorithm**, we only need the ratio of probabilities between two states. This mirrors physical systems where absolute energies are less important than energy differences between states, similar to unnormalised densities in Bayesian inference.
+## Bayesian or non-Bayesian modelling
+
+Many statistical modelling tools, such as `stan`, `mcstate`, and `BayesTools`, are inherently Bayesian. However, the `monty` package is **Bayesian-agnostic**: it does not require a Bayesian interpretation. This flexibility allows users to work with probabilistic models either within a Bayesian framework or outside of it, depending on the needs of the analysis.
+
+## Probability densities: normalised vs. unnormalised
+
+A key concept in Bayesian inference and Monte Carlo methods is the distinction between **normalised** and **unnormalised** probability densities.
+
+### Unnormalised density
+
+In Bayesian inference, rather than using the "full" Bayesian as above, we are often working with defining posterior density as being proportional to a product of densities:
+
+$$
+p(\theta | y) \propto p(y|\theta) p(\theta)
+$$
+
+where:
+
+  - $p(y|\theta)$ is the likelihood, and
+  - $p(\theta)$ is the prior distribution, as above.
+  
+Note that it does not involve anymore $p(y)$ the normalising constant. The reason is that since calculating $p(y)$ can be very difficult, we often work with the **unnormalised posterior density** $p(y|\theta)p(\theta)$. This unnormalised form is sufficient for many Monte Carlo methods, where only relative densities matter.
 
-### Monte Carlo Methods with Unnormalised Densities
+### Normalised density
 
-Monte Carlo methods like **Metropolis-Hastings** and **Importance Sampling** often operate with unnormalised densities, since only the relative probabilities affect the algorithm's behaviour. This allows us to bypass the need to calculate the normalising constant, which can be computationally expensive or intractable in high-dimensional spaces.
+A **normalised density** integrates to 1 over its entire parameter space. This is necessary for direct probability interpretations and for certain Bayesian methods, like model comparison using [Bayes factors](https://en.wikipedia.org/wiki/Bayes_factor).
 
-In Importance Sampling, for example, we estimate expectations with respect to the target distribution by drawing samples from a simpler proposal distribution and weighting them according to the unnormalised target density. 
+## Monte Carlo methods with unnormalised densities
 
+Monte Carlo algorithms, such as **Metropolis-Hastings** and **Importance Sampling**, often operate using unnormalised densities, focusing on relative probabilities rather than absolute values. This makes them efficient for high-dimensional problems where calculating a normalising constant is untractable.
 
+## Linking to Physics: partition functions and probability densities
 
-## Properties of monty models
+Normalisation in probability densities has parallels in physics. In statistical mechanics, the **partition function** $Z$ normalises probabilities over all possible states, much like normalising densities in Bayesian inference. This connection highlights why algorithms like **Metropolis-Hastings** use unnormalised densities: they mirror physical systems where absolute energies are less relevant than energy differences.
\ No newline at end of file

From 57ce74bb3f8fe3da5a5084ff7d2356b107b477f6 Mon Sep 17 00:00:00 2001
From: edknock <edwardknock@gmail.com>
Date: Wed, 13 Nov 2024 13:36:07 +0000
Subject: [PATCH 13/34] some light edits

---
 model.qmd | 2 +-
 monty.qmd | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/model.qmd b/model.qmd
index 77ef2a1..7eda2b9 100644
--- a/model.qmd
+++ b/model.qmd
@@ -99,7 +99,7 @@ where:
   - $p(y|\theta)$ is the likelihood, and
   - $p(\theta)$ is the prior distribution, as above.
   
-Note that it does not involve anymore $p(y)$ the normalising constant. The reason is that since calculating $p(y)$ can be very difficult, we often work with the **unnormalised posterior density** $p(y|\theta)p(\theta)$. This unnormalised form is sufficient for many Monte Carlo methods, where only relative densities matter.
+Note that it does not involve the normalising constant $p(y)$ anymore. The reason is that since calculating $p(y)$ can be very difficult, we often work with the **unnormalised posterior density** $p(y|\theta)p(\theta)$. This unnormalised form is sufficient for many Monte Carlo methods, where only relative densities matter.
 
 ### Normalised density
 
diff --git a/monty.qmd b/monty.qmd
index 1cd0e60..dac8c80 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -55,7 +55,7 @@ plot(l,
 
 ## Sampling from our example distribution
 
-We want now to sample from this model, using the `monty_sample()` function. For this we need to tell `monty` which sampler we want to use to explore our distribution. There are a variety of samplers avalaible and you can learn about them in @sec-samplers. One of the simplest is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently) in most cases.
+We now want to sample from this model, using the `monty_sample()` function. For this we need to tell `monty` which sampler we want to use to explore our distribution. There are a variety of samplers avalaible and you can learn about them in @sec-samplers. One of the simplest is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently) in most cases.
 
 The random walk sampler uses a variance-covariance (VCV) matrix to guide its exploration, determining the 'jump' from the current point to the next in a random walk by drawing from a multivariate normal distribution parameterised by this matrix. For a single-parameter model, we use a 1x1 matrix (`matrix(1)`) as our VCV matrix.
 

From 87a98a9e0fa5f1bc7710ab5272562e4b05141a77 Mon Sep 17 00:00:00 2001
From: MJomaba <MJomaba@users.noreply.github.com>
Date: Wed, 13 Nov 2024 16:05:43 +0000
Subject: [PATCH 14/34] Spacing of formula

Co-authored-by: edknock <47318334+edknock@users.noreply.github.com>
---
 monty.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index dac8c80..8e86077 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -26,7 +26,7 @@ To build our `monty` model, we start by defining an R function returning the log
 
 ```{r}
 fn <- function(l, p, m1, m2) {
-  log(p*dnorm(l,mean = m1) + (1-p)*dnorm(l, mean = m2))
+  log(p * dnorm(l, mean = m1) + (1 - p) * dnorm(l, mean = m2))
 }
 ```
 

From 8a501b17055502143f6e36c7729ac21632f9d84c Mon Sep 17 00:00:00 2001
From: MJomaba <MJomaba@users.noreply.github.com>
Date: Wed, 13 Nov 2024 16:06:12 +0000
Subject: [PATCH 15/34] Ironing up spacing

Co-authored-by: edknock <47318334+edknock@users.noreply.github.com>
---
 monty.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index 8e86077..1c5de05 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -34,7 +34,7 @@ Assuming that the population is 75% from subpopulation 1 (normally distributed w
 
 ```{r}
 l_distribution <- monty_model_function(fn,
-                                       fixed = list(p = 0.75, m1 = 3, m2=7))
+                                       fixed = list(p = 0.75, m1 = 3, m2 = 7))
 ```
 
 We have just created a `monty` model. 

From c9a44a3f9ebc9ddff3b79d52defe1bb82f7d942a Mon Sep 17 00:00:00 2001
From: MJomaba <MJomaba@users.noreply.github.com>
Date: Wed, 13 Nov 2024 16:07:02 +0000
Subject: [PATCH 16/34] Amending text

Co-authored-by: edknock <47318334+edknock@users.noreply.github.com>
---
 monty.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index 1c5de05..8b75bce 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -65,7 +65,7 @@ The choice of the VCV matrix is critical for the sampler's efficiency, especiall
 sampler <- monty_sampler_random_walk(matrix(1))
 ```
 
-We are now ready to sample from our distribution using `monty_sample()` with 2000 samples, starting our MCMC chain at the value 3 and running 4 chains simultaneously.
+We are now ready to sample from our distribution using `monty_sample()` with 2000 samples, starting our MCMC chain at the value 3 and running 4 chains sequentially.
 
 ```{r}
 samples <- monty_sample(l_distribution, sampler, 2000, initial = 3, n_chains = 4)

From 4a137418fa3849b851bdcca99bacd1243619f531 Mon Sep 17 00:00:00 2001
From: MJomaba <MJomaba@users.noreply.github.com>
Date: Wed, 13 Nov 2024 16:10:42 +0000
Subject: [PATCH 17/34] Consistent spacing

Co-authored-by: edknock <47318334+edknock@users.noreply.github.com>
---
 monty.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index 8b75bce..2ad695f 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -79,7 +79,7 @@ matplot(samples$density, type = "l", lty = 1,
 
 We can also check that our samples are correctly representing our distribution:
 ```{r}
-hist(samples$pars["l",,], breaks=100)
+hist(samples$pars["l", , ], breaks = 100)
 ```
 
 ## Connection of `monty` with other software

From 99e02d20cbf50fc48e37cb0783ed4374c844ed3a Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 07:37:38 +0000
Subject: [PATCH 18/34] Intect to use allow_multiple_parameters to replace
 Vectorize

---
 monty.qmd | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index 2ad695f..5e8ee86 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -34,7 +34,8 @@ Assuming that the population is 75% from subpopulation 1 (normally distributed w
 
 ```{r}
 l_distribution <- monty_model_function(fn,
-                                       fixed = list(p = 0.75, m1 = 3, m2 = 7))
+                                       fixed = list(p = 0.75, m1 = 3, m2 = 7),
+                                       allow_multiple_parameters = TRUE)
 ```
 
 We have just created a `monty` model. 
@@ -46,7 +47,9 @@ l_distribution
 We can plot the density of our model for values of $l$ between 0 and 10 and eye-check that we can see the two subpopulations in the correct proportions and with the 3 and 7 as modes.
 
 ```{r}
+#l <- matrix(seq(from = 0, to = 10, by = 0.1), 1)
 l <- seq(from = 0, to = 10, by = 0.1)
+#monty_model_density(l_distribution, l)
 plot(l,
      exp(Vectorize(l_distribution$density)(l)),
      type = "l",

From 093f6091bd439755cb24e277ac69c52b7e753dd2 Mon Sep 17 00:00:00 2001
From: MJomaba <MJomaba@users.noreply.github.com>
Date: Thu, 14 Nov 2024 10:22:02 +0000
Subject: [PATCH 19/34] Update _quarto.yml

Co-authored-by: Rich FitzJohn <r.fitzjohn@imperial.ac.uk>
---
 _quarto.yml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/_quarto.yml b/_quarto.yml
index 3e8ef1b..887b63d 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -22,16 +22,16 @@ book:
       chapters:
       - monty.qmd
       - model.qmd
-      - montyDSL.qmd
+      - monty-dsl.qmd
       - composability.qmd
       - samplers.qmd
       - runners.qmd
-      - advanced_monty.qmd
+      - advanced-monty.qmd
     - part: "Odin & Monty"
       chapters:
       - inference.qmd
       - differentiability.qmd
-      - 2009fluUK.qmd
+      - 2009-flu-uk.qmd
     - references.qmd
   appendices:
     - installation.qmd

From 737b1399d1f51d7fa8452f5a4cf05b70777d73be Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 10:31:54 +0000
Subject: [PATCH 20/34] Change name of files to match book style

---
 2009fluUK.qmd => 2009-flu-uk.qmd         | 0
 advanced_monty.qmd => advanced-monty.qmd | 0
 montyDSL.qmd => monty-dsl.qmd            | 0
 3 files changed, 0 insertions(+), 0 deletions(-)
 rename 2009fluUK.qmd => 2009-flu-uk.qmd (100%)
 rename advanced_monty.qmd => advanced-monty.qmd (100%)
 rename montyDSL.qmd => monty-dsl.qmd (100%)

diff --git a/2009fluUK.qmd b/2009-flu-uk.qmd
similarity index 100%
rename from 2009fluUK.qmd
rename to 2009-flu-uk.qmd
diff --git a/advanced_monty.qmd b/advanced-monty.qmd
similarity index 100%
rename from advanced_monty.qmd
rename to advanced-monty.qmd
diff --git a/montyDSL.qmd b/monty-dsl.qmd
similarity index 100%
rename from montyDSL.qmd
rename to monty-dsl.qmd

From 211104dd14ac19d8abb8804c9b77db52b93b20fd Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 10:44:29 +0000
Subject: [PATCH 21/34] Remove mcstate to avoid confusion and WinBUGS and JAGS
 as example of Bayesian orientated softwares.

---
 model.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/model.qmd b/model.qmd
index 7eda2b9..61f8472 100644
--- a/model.qmd
+++ b/model.qmd
@@ -80,7 +80,7 @@ The `odin` package allows us to build and explore the behaviours of dynamical mo
 
 ## Bayesian or non-Bayesian modelling
 
-Many statistical modelling tools, such as `stan`, `mcstate`, and `BayesTools`, are inherently Bayesian. However, the `monty` package is **Bayesian-agnostic**: it does not require a Bayesian interpretation. This flexibility allows users to work with probabilistic models either within a Bayesian framework or outside of it, depending on the needs of the analysis.
+Many statistical modelling tools, such as [WinBUGS](https://en.wikipedia.org/wiki/WinBUGS), [JAGS](https://en.wikipedia.org/wiki/Just_another_Gibbs_sampler), `stan`, and `BayesTools`, are inherently Bayesian. However, the `monty` package is **Bayesian-agnostic**: it does not require a Bayesian interpretation. This flexibility allows users to work with probabilistic models either within a Bayesian framework or outside of it, depending on the needs of the analysis.
 
 ## Probability densities: normalised vs. unnormalised
 

From 83a5ca77caebd4d01d76998df4e870c242e258dc Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 10:47:12 +0000
Subject: [PATCH 22/34] Remove mcstate to avoid confusion and WinBUGS and JAGS
 as example of BayesianTransform sections into paragraphs as suggested in
 review to improve flow

---
 model.qmd | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/model.qmd b/model.qmd
index 61f8472..f4c0147 100644
--- a/model.qmd
+++ b/model.qmd
@@ -86,8 +86,6 @@ Many statistical modelling tools, such as [WinBUGS](https://en.wikipedia.org/wik
 
 A key concept in Bayesian inference and Monte Carlo methods is the distinction between **normalised** and **unnormalised** probability densities.
 
-### Unnormalised density
-
 In Bayesian inference, rather than using the "full" Bayesian as above, we are often working with defining posterior density as being proportional to a product of densities:
 
 $$
@@ -101,14 +99,8 @@ where:
   
 Note that it does not involve the normalising constant $p(y)$ anymore. The reason is that since calculating $p(y)$ can be very difficult, we often work with the **unnormalised posterior density** $p(y|\theta)p(\theta)$. This unnormalised form is sufficient for many Monte Carlo methods, where only relative densities matter.
 
-### Normalised density
-
 A **normalised density** integrates to 1 over its entire parameter space. This is necessary for direct probability interpretations and for certain Bayesian methods, like model comparison using [Bayes factors](https://en.wikipedia.org/wiki/Bayes_factor).
 
-## Monte Carlo methods with unnormalised densities
-
 Monte Carlo algorithms, such as **Metropolis-Hastings** and **Importance Sampling**, often operate using unnormalised densities, focusing on relative probabilities rather than absolute values. This makes them efficient for high-dimensional problems where calculating a normalising constant is untractable.
 
-## Linking to Physics: partition functions and probability densities
-
 Normalisation in probability densities has parallels in physics. In statistical mechanics, the **partition function** $Z$ normalises probabilities over all possible states, much like normalising densities in Bayesian inference. This connection highlights why algorithms like **Metropolis-Hastings** use unnormalised densities: they mirror physical systems where absolute energies are less relevant than energy differences.
\ No newline at end of file

From c4c0b72c137ba74c78e1ec2a9410f7e39255db31 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 10:57:14 +0000
Subject: [PATCH 23/34] Removing advanced monty section as too little content
 for now

---
 _quarto.yml        |  1 -
 advanced-monty.qmd | 14 --------------
 2 files changed, 15 deletions(-)
 delete mode 100644 advanced-monty.qmd

diff --git a/_quarto.yml b/_quarto.yml
index 887b63d..64d391a 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -26,7 +26,6 @@ book:
       - composability.qmd
       - samplers.qmd
       - runners.qmd
-      - advanced-monty.qmd
     - part: "Odin & Monty"
       chapters:
       - inference.qmd
diff --git a/advanced-monty.qmd b/advanced-monty.qmd
deleted file mode 100644
index 9569314..0000000
--- a/advanced-monty.qmd
+++ /dev/null
@@ -1,14 +0,0 @@
-# Advanced monty
-
-```{r}
-#| include: false
-source("common.R")
-```
-
-## Observers
-
-## Working with other packages
-
-Exporting chains to work with other packages
-
-## Nested models
\ No newline at end of file

From 8f5632cd5d46e981692f740669f589003b924269 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 11:03:37 +0000
Subject: [PATCH 24/34] Remove composability section as too little content for
 now, will be put back with some potential reorganising of the content in a
 follow-up iteration

---
 _quarto.yml       |  1 -
 composability.qmd | 29 -----------------------------
 2 files changed, 30 deletions(-)
 delete mode 100644 composability.qmd

diff --git a/_quarto.yml b/_quarto.yml
index 64d391a..f1467a2 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -23,7 +23,6 @@ book:
       - monty.qmd
       - model.qmd
       - monty-dsl.qmd
-      - composability.qmd
       - samplers.qmd
       - runners.qmd
     - part: "Odin & Monty"
diff --git a/composability.qmd b/composability.qmd
deleted file mode 100644
index 8fb2fbb..0000000
--- a/composability.qmd
+++ /dev/null
@@ -1,29 +0,0 @@
-# Composability {#sec-monty-combine}
-
-```{r}
-#| include: false
-source("common.R")
-```
-
-```{r}
-library(monty)
-```
-
-```{r}
-# A simple example; a model that contains something of interest,
-# and a simple prior from monty_dsl
-likelihood <- monty_example("banana")
-prior <- monty_dsl({
-  alpha ~ Normal(0, 1)
-  beta ~ Normal(0, 10)
-})
-posterior <- likelihood + prior
-posterior
-
-# # The same thing, more explicitly:
-# monty_model_combine(likelihood, prior)
-# 
-# # Control properties of the combined model:
-# monty_model_combine(likelihood, prior,
-#                     monty_model_properties(has_gradient = FALSE))
-```
\ No newline at end of file

From de2fcbad2a43a9f21114978a3c6e6f26682de49d Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 11:32:18 +0000
Subject: [PATCH 25/34] Changing 'l_distribution' to more explicit name
 'mixture_model'

---
 monty.qmd | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/monty.qmd b/monty.qmd
index 5e8ee86..bbff64b 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -33,7 +33,7 @@ fn <- function(l, p, m1, m2) {
 Assuming that the population is 75% from subpopulation 1 (normally distributed with mean 3) and 25% from subpopulation 2 (also normally distributed but with mean 7), we can build our `monty` model by indicating that the parameters of the subpopulations are fixed at these given values.
 
 ```{r}
-l_distribution <- monty_model_function(fn,
+mixture_model <- monty_model_function(fn,
                                        fixed = list(p = 0.75, m1 = 3, m2 = 7),
                                        allow_multiple_parameters = TRUE)
 ```
@@ -41,7 +41,7 @@ l_distribution <- monty_model_function(fn,
 We have just created a `monty` model. 
 
 ```{r}
-l_distribution
+mixture_model
 ```
 
 We can plot the density of our model for values of $l$ between 0 and 10 and eye-check that we can see the two subpopulations in the correct proportions and with the 3 and 7 as modes.
@@ -51,7 +51,7 @@ We can plot the density of our model for values of $l$ between 0 and 10 and eye-
 l <- seq(from = 0, to = 10, by = 0.1)
 #monty_model_density(l_distribution, l)
 plot(l,
-     exp(Vectorize(l_distribution$density)(l)),
+     exp(Vectorize(mixture_model$density)(l)),
      type = "l",
      ylab = "density")
 ```
@@ -71,7 +71,7 @@ sampler <- monty_sampler_random_walk(matrix(1))
 We are now ready to sample from our distribution using `monty_sample()` with 2000 samples, starting our MCMC chain at the value 3 and running 4 chains sequentially.
 
 ```{r}
-samples <- monty_sample(l_distribution, sampler, 2000, initial = 3, n_chains = 4)
+samples <- monty_sample(mixture_model, sampler, 2000, initial = 3, n_chains = 4)
 ```
 
 We can visualise our 4 chains.

From f8543c61f1dfdc2c35b60de315dede54b09a1dc4 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 11:37:00 +0000
Subject: [PATCH 26/34] Change variance of 1-D VCV matrix to avoid confusion
 between dimension of matrix and value of variance of proposal distribution

---
 monty.qmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/monty.qmd b/monty.qmd
index bbff64b..5c8b0c2 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -60,12 +60,12 @@ plot(l,
 
 We now want to sample from this model, using the `monty_sample()` function. For this we need to tell `monty` which sampler we want to use to explore our distribution. There are a variety of samplers avalaible and you can learn about them in @sec-samplers. One of the simplest is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently) in most cases.
 
-The random walk sampler uses a variance-covariance (VCV) matrix to guide its exploration, determining the 'jump' from the current point to the next in a random walk by drawing from a multivariate normal distribution parameterised by this matrix. For a single-parameter model, we use a 1x1 matrix (`matrix(1)`) as our VCV matrix.
+The random walk sampler uses a variance-covariance (VCV) matrix to guide its exploration, determining the 'jump' from the current point to the next in a random walk by drawing from a multivariate normal distribution parameterised by this matrix. For our single-parameter model here, we use a 1x1 matrix of variance 2 (`matrix(2)`) as our VCV matrix.
 
 The choice of the VCV matrix is critical for the sampler's efficiency, especially in more complex cases where the tuning of this matrix can significantly affect performance. A well-chosen VCV matrix optimises moving accross the parameter space, making the random walk sampler more effective in exploring the distribution of interest.
 
 ```{r}
-sampler <- monty_sampler_random_walk(matrix(1))
+sampler <- monty_sampler_random_walk(matrix(2))
 ```
 
 We are now ready to sample from our distribution using `monty_sample()` with 2000 samples, starting our MCMC chain at the value 3 and running 4 chains sequentially.

From 40c0dac682a2e8519907a992042020efdeef8e22 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 11:41:37 +0000
Subject: [PATCH 27/34] Removing runners section as too little content

---
 _quarto.yml | 1 -
 runners.qmd | 8 --------
 2 files changed, 9 deletions(-)
 delete mode 100644 runners.qmd

diff --git a/_quarto.yml b/_quarto.yml
index f1467a2..130b7d6 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -24,7 +24,6 @@ book:
       - model.qmd
       - monty-dsl.qmd
       - samplers.qmd
-      - runners.qmd
     - part: "Odin & Monty"
       chapters:
       - inference.qmd
diff --git a/runners.qmd b/runners.qmd
deleted file mode 100644
index 358d464..0000000
--- a/runners.qmd
+++ /dev/null
@@ -1,8 +0,0 @@
-# Runners
-
-```{r}
-#| include: false
-source("common.R")
-```
-
-One of the focus of `monty` is tailoring implementation to optimise usage of available hardware while ensuring reproducible results accross platforms.
\ No newline at end of file

From d8b94bc30b1ef041a6fdbea8abbad4a68fa756ee Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 11:59:59 +0000
Subject: [PATCH 28/34] Working on the rewording and deleting a miniscule
 section that was out of place

---
 model.qmd | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/model.qmd b/model.qmd
index f4c0147..f93ac52 100644
--- a/model.qmd
+++ b/model.qmd
@@ -62,9 +62,11 @@ where:
   - $p(\theta)$ is the prior distribution, and
   - $p(y)$ is a normalising constant to ensure the posterior integrates to 1 over all $\theta$ values.
 
-Bayesian inference updates our beliefs about parameters based on observed data, yielding a posterior distribution that reflects both model assumptions and data influence. This probabilistic framework around a dynamical system’s parameters creates a flexible statistical model that adapts as new data becomes available.
+Bayesian inference allows us to update our understanding of parameters based on observed data, yielding a posterior distribution that incorporates both model assumptions and the influence of the data. This approach provides a probabilistic framework that can adapt dynamically as new data becomes available, making it ideal for statistical models that require flexibility.
 
-In this way, **stochastic processes** add an uncertainty dimension to the state of dynamical systems, while **Bayesian inference** applies probabilistic modelling to the parameters, merging dynamical and statistical perspectives. Together, these methods provide a rich toolkit for modelling complex systems that evolve over time and accounting for both randomness and uncertainty.
+Many statistical modelling tools—such as [WinBUGS](https://en.wikipedia.org/wiki/WinBUGS), [JAGS](https://en.wikipedia.org/wiki/Just_another_Gibbs_sampler), `stan`, and `BayesTools`—are fundamentally Bayesian. However, the `monty` package is **Bayesian-agnostic**, allowing users to choose between Bayesian and non-Bayesian approaches to probabilistic modelling, depending on the analytical requirements.
+
+By combining **stochastic processes** with **Bayesian inference**, we add a dual dimension of uncertainty: randomness in the state of dynamical systems through stochastic processes, and probabilistic modelling of parameters through Bayesian methods. Together, these frameworks enable us to build robust models for complex systems that evolve over time, capturing both inherent randomness and uncertainty in our understanding.
 
 ## Parameters and model complexity
 
@@ -78,10 +80,6 @@ Statistical models, meanwhile, use parameters to define probability distribution
 
 The `odin` package allows us to build and explore the behaviours of dynamical models over time, while `monty` enables us to quantify uncertainties and make probabilistic inferences without simulating detailed dynamics.
 
-## Bayesian or non-Bayesian modelling
-
-Many statistical modelling tools, such as [WinBUGS](https://en.wikipedia.org/wiki/WinBUGS), [JAGS](https://en.wikipedia.org/wiki/Just_another_Gibbs_sampler), `stan`, and `BayesTools`, are inherently Bayesian. However, the `monty` package is **Bayesian-agnostic**: it does not require a Bayesian interpretation. This flexibility allows users to work with probabilistic models either within a Bayesian framework or outside of it, depending on the needs of the analysis.
-
 ## Probability densities: normalised vs. unnormalised
 
 A key concept in Bayesian inference and Monte Carlo methods is the distinction between **normalised** and **unnormalised** probability densities.

From 037dc7970c2cecdccd734d8a6dc95491975d26c7 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Thu, 14 Nov 2024 12:16:49 +0000
Subject: [PATCH 29/34] Reworking of the parameter section

---
 model.qmd | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/model.qmd b/model.qmd
index f93ac52..b6a91e5 100644
--- a/model.qmd
+++ b/model.qmd
@@ -70,15 +70,13 @@ By combining **stochastic processes** with **Bayesian inference**, we add a dual
 
 ## Parameters and model complexity
 
-### Parameters in dynamical systems
+In both dynamical and statistical frameworks, the number of parameters can be adjusted as needed to capture the desired level of complexity. In the `monty` package, random variables - termed 'parameters' with a slight simplification of language - are typically used to summarise processes, and so they often form a more compact set than those found in dynamical models. This distinction is especially relevant in Bayesian models constructed from complex `odin` models.
 
-In dynamical models, parameters define the structure and dynamics of the scenario. For example, a detailed epidemiological model may need parameters for transmission rates, contact patterns, or intervention schedules. These inputs allow us to examine "what-if" scenarios, enabling decision-makers to predict and manage changes in real-time.
+In dynamical systems, parameters define the structure and evolution of a scenario in detail. For instance, an epidemiological model may include parameters for transmission rates, contact patterns, or intervention schedules. These inputs enable "what-if" scenarios, allowing decision-makers to predict and manage changes in real time. The `odin` package, designed to support such dynamical models, provides the flexibility to specify numerous parameters for exploring system behaviours over time. 
 
-### Parameters in statistical models
+Statistical models, by contrast, use parameters to define probability distributions over possible outcomes, capturing uncertainties, predicting risks, or summarising data patterns. In Bayesian models based on a complex `odin` framework, the statistical parameters are usually a subset of those used in the dynamical model itself. Parameters such as those defining a vaccination campaign (e.g. daiy number of doses given to target groups), for example, might be central to shaping the `odin` model but may not necessarily be included in Bayesian inference (that might focus on just vaccine efficacy at most). This selective approach allows us to quantify uncertainties and make probabilistic inferences about key aspects of the model without needing to explore detail of the underlying dynamics that are "known" for what was actually observed.
 
-Statistical models, meanwhile, use parameters to define probability distributions over possible outcomes rather than scenario-specific details. Here, parameters describe uncertainties, often quantifying risks, making predictions, or summarising patterns in data.
-
-The `odin` package allows us to build and explore the behaviours of dynamical models over time, while `monty` enables us to quantify uncertainties and make probabilistic inferences without simulating detailed dynamics.
+Thus, while dynamical models rely on a broad parameter set for flexibility, statistical parameters summarise uncertainty more compactly, making the combined approach especially effective for realistic, data-driven inferences. 
 
 ## Probability densities: normalised vs. unnormalised
 

From 42291a8855aeb4d5dfac835e2e994f4bec411020 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Tue, 19 Nov 2024 14:06:05 +0100
Subject: [PATCH 30/34] Correct typo

---
 monty.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index 5e8ee86..61019e5 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -58,7 +58,7 @@ plot(l,
 
 ## Sampling from our example distribution
 
-We now want to sample from this model, using the `monty_sample()` function. For this we need to tell `monty` which sampler we want to use to explore our distribution. There are a variety of samplers avalaible and you can learn about them in @sec-samplers. One of the simplest is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently) in most cases.
+We now want to sample from this model, using the `monty_sample()` function. For this we need to tell `monty` which sampler we want to use to explore our distribution. There are a variety of samplers available and you can learn about them in @sec-samplers. One of the simplest is the random walk [Metropolis-Hastings](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm) algorithm that should work almost out of the box (though not necesseraly efficiently) in most cases.
 
 The random walk sampler uses a variance-covariance (VCV) matrix to guide its exploration, determining the 'jump' from the current point to the next in a random walk by drawing from a multivariate normal distribution parameterised by this matrix. For a single-parameter model, we use a 1x1 matrix (`matrix(1)`) as our VCV matrix.
 

From af8ce8d0deb451bbe01ca339f6045b8aede09d8e Mon Sep 17 00:00:00 2001
From: MJomaba <MJomaba@users.noreply.github.com>
Date: Tue, 19 Nov 2024 13:09:30 +0000
Subject: [PATCH 31/34] Update monty.qmd

Co-authored-by: edknock <47318334+edknock@users.noreply.github.com>
---
 monty.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index c16210e..4637af7 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -24,7 +24,7 @@ library(monty)
 
 We can define a simple Gaussian [mixture model](https://en.wikipedia.org/wiki/Mixture_model) of two sub-populations using the `monty_model_function()` from the package. The model represents the distribution of a quantity $l$, sampled with a probability $p$ from a normal distribution of mean $m_{1}$ and with a probability $1-p$ from another normal distribution of mean $m_{2}$. Both subpopulations have variance equal to 1. 
 
-To build our `monty` model, we start by defining an R function returning the log-density of our statistical model. This function has for arguments $l$, the quantity of interest and the three parameters $p$, $m_{1}$ and $m_{1}$ defining the two Normally-distributed subpopulations.
+To build our `monty` model, we start by defining an R function returning the log-density of our statistical model. This function has four arguments $l$ (the quantity of interest) and the three parameters $p$, $m_{1}$ and $m_{2}$ defining the two Normally-distributed subpopulations.
 
 ```{r}
 fn <- function(l, p, m1, m2) {

From d3be774489ded7fb0f267ae9a73843e184868557 Mon Sep 17 00:00:00 2001
From: MJomaba <MJomaba@users.noreply.github.com>
Date: Tue, 19 Nov 2024 13:09:50 +0000
Subject: [PATCH 32/34] Update monty.qmd

Co-authored-by: edknock <47318334+edknock@users.noreply.github.com>
---
 monty.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty.qmd b/monty.qmd
index 4637af7..6830744 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -46,7 +46,7 @@ We have just created a `monty` model.
 mixture_model
 ```
 
-We can plot the density of our model for values of $l$ between 0 and 10 and eye-check that we can see the two subpopulations in the correct proportions and with the 3 and 7 as modes.
+We can plot the density of our model for values of $l$ between 0 and 10 and eye-check that we can see the two subpopulations in the correct proportions and with 3 and 7 as modes.
 
 ```{r}
 #l <- matrix(seq(from = 0, to = 10, by = 0.1), 1)

From 7f4042945b130854e5917caaae33d15249ac07e2 Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Tue, 19 Nov 2024 14:14:07 +0100
Subject: [PATCH 33/34] Amend text to remove missing link

---
 monty.qmd | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/monty.qmd b/monty.qmd
index c16210e..a495462 100644
--- a/monty.qmd
+++ b/monty.qmd
@@ -14,7 +14,7 @@ The `monty` R package is designed for modular work with statistical distribution
 3. Integration of `odin` models,
 4. Composition of two `monty` models.
 
-The most basic method to define a `monty` model is through a simple R function, as shown in this chapter. More advanced examples are covered in the @sec-monty-models in particular in the context of Bayesian statistics. @sec-monty-dsl introduces the monty DSL for more versatile model building using a small probabilistic DSL similar to BUGs. @sec-monty-combine explains how to compose new monty models by combining two existing ones. The last part of this book starting from @sec-inference demonstrates how to incorporate `odin` models into `monty` workflows.
+The most basic method to define a `monty` model is through a simple R function, as shown in this chapter. More advanced examples are covered in the @sec-monty-models in particular in the context of Bayesian statistics. @sec-monty-dsl introduces the monty DSL for more versatile model building using a small probabilistic DSL similar to BUGs. The last part of this book starting from @sec-inference demonstrates how to incorporate `odin` models into `monty` workflows.
 
 ## A simple example
 
@@ -97,4 +97,4 @@ hist(samples$pars["l", , ], breaks = 100)
 
 We've discussed a little bit about the philosophy of `monty` and built a simple model using an R function with `monty_function`. The example introduces the basics of defining and sampling from a `monty` model.
 
-For more advanced applications, refer to @sec-monty-dsl for constructing models using `monty`'s domain-specific language, which enables greater flexibility for complex structures. To combine multiple models, see @sec-monty-combine, which demonstrates how to compose and integrate existing `monty` models. Lastly, @sec-inference illustrates how to incorporate `odin` models into `monty` workflows, expanding the package’s potential for Bayesian inference with more sophisticated, system-based models. Each section provides detailed examples to guide you in leveraging `monty`’s full capabilities.
+For more advanced applications, refer to @sec-monty-dsl for constructing models using `monty`'s domain-specific language, which enables greater flexibility for complex structures. Lastly, @sec-inference illustrates how to incorporate `odin` models into `monty` workflows, expanding the package’s potential for Bayesian inference with more sophisticated, system-based models. Each section provides detailed examples to guide you in leveraging `monty`’s full capabilities.

From e0e036a6913aa0d66df1273b2ef759b58540e2ce Mon Sep 17 00:00:00 2001
From: Marc Baguelin <m.baguelin@imperial.ac.uk>
Date: Tue, 19 Nov 2024 14:16:42 +0100
Subject: [PATCH 34/34] Correct spelling

---
 monty-dsl.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monty-dsl.qmd b/monty-dsl.qmd
index e32883a..12a431f 100644
--- a/monty-dsl.qmd
+++ b/monty-dsl.qmd
@@ -5,7 +5,7 @@
 source("common.R")
 ```
 
-The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focus on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
+The `monty` DSL provides a more intuitive way to define statistical models with `monty`. It is currently relatively basic and focuses on providing support for defining priors in Bayesian models. It fully support differentiability allowing to use gradient based samplers on these models.
 
 ```{r}
 library(monty)