S8_Text.Rmd

---
title: "S8 Appendix"
output: 
  pdf_document:
    number_sections: true
header-includes:
  - \usepackage{booktabs}
urlcolor: blue
---

This appendix illustrates the comparison analysis performed on the output
generated by the Data Generating Processes studied in this work.

\tableofcontents 

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE,
                      warning = FALSE)

library(dplyr)
library(imputeTS)
library(lubridate)
library(pomp)
library(purrr)
library(readr)
library(readxl)
library(stringr)
library(tibble)
library(tidyr)

source("./R_scripts/data.R")
source("./R_scripts/POMP_models.R")
source("./R_scripts/plots.R")

data_list <- get_data()
```

\newpage

# Comparison of DGP1's candidate measurement models

This section shows the comparison among the inferences produced by DGP1's 
candidate models 2, 5, and 6. The reader can find the specification of each 
model in S1.

## Incidence

Recall that models 5 and 6 account for weekly incidence data. The difference
between these two models is that model 6 links the relative contact rate to
Apple's driving indexes. The graph below shows that both models yield similar 
fits to the weekly number of new cases. That is, incorporating mobility data
into model 6 does not compromise its accuracy in predicting the incidence
rate.

\hfill

```{r}
folder       <- str_glue("./Saved_objects/SEI3R_GBM/mdl_2")
summary_C    <- file.path(folder, "summary_C.rds") |> readRDS()

plot_daily_fit(summary_C, rename(data_list[[1]], y = y1), "C[t]",
               "Model 2", 18, GBM_colour, 0.8) +
  scale_y_continuous(limits = c(0, 5e3))-> g_daily


ids <- c(5, 6)

lapply(ids, function(model_id) {
  
  folder       <- str_glue("./Saved_objects/SEI3R_GBM/mdl_{model_id}")
  summary_C    <- file.path(folder, "summary_C.rds") |> readRDS() |> 
    mutate(model = model_id, week = time / 7)
  
  plot_wkl_fit(summary_C, rename(data_list[[2]], y = y1), y_lab = "C[t]",
             shape = 18, plot_colour = GBM_colour, 
             title_lab = paste0("Model ", model_id)) +
    scale_y_continuous(limits = c(0, 5e3))
  
}) -> g_wkl_inc
```


```{r, fig.height = 3, dev = 'cairo_pdf'}
g_daily | g_wkl_inc[[1]] | g_wkl_inc[[2]]
```

\newpage

## Relative effective contact rate

Regarding the relative effective contact rate ($Z_t$), Models 2 and 5 convey
similar uncertainties. This result implies that changing the periodicity in the
measurement model (from daily to weekly) does not translate into a loss of 
information. However, accounting for mobility data in the measurement model has
an effect on the predictions generated by Model 6. Specifically, the uncertainty
in ($Z_t$) is smaller than that estimated by models 2 and 5.

\hfill

```{r}
folder       <- str_glue("./Saved_objects/SEI3R_GBM/mdl_2")
summary_Z    <- file.path(folder, "summary_Z.rds") |> readRDS()

plot_daily_fit(summary_Z, rename(data_list[[1]], y = y2), "Z[t]",
               "Model 2", 16, GBM_colour, 0.8) +
  scale_y_continuous(limits = c(0, 5)) +
  scale_x_continuous(limits = c(0, 84), breaks = seq(0, 84, 7)) -> g_daily


ids <- c(5, 6)

lapply(ids, function(model_id) {
  
  folder       <- str_glue("./Saved_objects/SEI3R_GBM/mdl_{model_id}")
  summary_Z    <- file.path(folder, "summary_Z.rds") |> readRDS() |> 
    mutate(model = model_id, week = time / 7)
  
  plot_wkl_fit(summary_Z, rename(data_list[[2]], y = y2), y_lab = "Z[t]",
             shape = 16, plot_colour = GBM_colour, 
             title_lab = paste0("Model ", model_id)) +
    scale_y_continuous(limits = c(0, 5))
  
}) -> g_wkl_inc
```

```{r, dev = 'cairo_pdf', fig.height = 7}
g_daily / g_wkl_inc[[1]] / g_wkl_inc[[2]]
```

\newpage

## Effective reproduction number

Likewise, the difference in uncertainties is transferred to the predicted $\Re_t$.

\hfill

```{r}
folder       <- str_glue("./Saved_objects/SEI3R_GBM/mdl_2")
summary_Re  <- file.path(folder, "summary_Re.rds") |> readRDS()

plot_daily_re(summary_Re, GBM_colour, "Model 2") +
  scale_y_continuous(limits = c(0, 20)) +
  scale_x_continuous(limits = c(0, 84), breaks = seq(0, 84, 7)) -> g_daily


ids <- c(5, 6)

lapply(ids, function(model_id) {
  
  folder       <- str_glue("./Saved_objects/SEI3R_GBM/mdl_{model_id}")
  summary_Re    <- file.path(folder, "summary_Re.rds") |> readRDS() |> 
    mutate(model = model_id, week = time / 7)
  
  plot_hidden_re(summary_Re, GBM_colour, paste0("Model ", model_id)) +
    scale_y_continuous(limits = c(0, 20))
  
}) -> g_wkl_inc
```

```{r, dev = 'cairo_pdf', fig.height = 7}
g_daily / g_wkl_inc[[1]] / g_wkl_inc[[2]]
```

\newpage

## Parameter estimates

\hfill

The inference of time-independent parameters across Models 2, 5, and 6 
reinforces the message outlined in the preceding sections: Models 2 
and 5 estimate similar quantities, whereas Model 6 yields more precise 
estimates (due to the incorporation of mobility data).

\hfill


```{r}
pars <- c("zeta", "P_0", "alpha", "Re_0")

map_df(c("2", "5", "6"), function(model_id) {
  
  folder  <- str_glue("./Saved_objects/SEI3R_GBM/mdl_{model_id}")
  fn_par  <- file.path(folder, "par_estimates.csv")
  par_df  <- read_csv(fn_par) |> 
    filter(name %in% pars, method == "Profile") |> 
    mutate(model = paste0("Model ", model_id))
}) -> par_summary
```

```{r, dev = 'cairo_pdf'}
plot_par_comparison(par_summary)
```

\newpage

# Simulations from MLE

_This section is added for reproducibility purposes._

We simulate DGP1 and DGP2 using the Maximum Likelihood Estimate (obtained in S5
and S6, respectively). These simulations produce sample trajectories of 
$\beta_t$, which are showcased in the main document.

\hfill

```{r}
obs_df  <- data_list[["Weekly"]]
inc_mdl <- "Pois"
mob_mdl <- TRUE

model_id     <- 6
folder       <- str_glue("./Saved_objects/SEI3R_GBM/mdl_{model_id}")
mdl_filename <- str_glue("GBM_{model_id}")
pomp_obj     <- pomp_GBM(inc_mdl, mob_mdl, obs_df, mdl_filename, 1 / 128)
pomp_mdl     <- pomp_obj$mdl
par_obj      <- pomp_obj$pars
fixed_params <- par_obj$fixed
params       <- par_obj$all

mle_df       <- read_csv(file.path(folder, "mle.csv"))
```

```{r}
mle_vector <- deframe(mle_df[, c("name", "value")])
n_demo <- 500

init_df <- data.frame(.id = 1:n_demo, week = 0, beta = mle_vector[["zeta"]])

set.seed(261483779)

pomp::simulate(pomp_mdl, params = c(par_obj$fixed, mle_vector),
                       nsim = n_demo) |>  as.data.frame() |>  
  select(.id, time, Z) |>  
  mutate(beta = Z * mle_vector[["zeta"]],
         week = time / 7) |>  
  select(-Z, - time) |>  
  bind_rows(init_df) |>  
  arrange(.id, week) -> demo_GBM

g_demo_GBM <- plot_demo(demo_GBM, "A) Geometric Brownian Motion (GBM)", 
                        GBM_colour)
```

```{r}
obs_df  <- data_list[["Weekly"]]
inc_mdl <- "Pois"
mob_mdl <- TRUE

model_id     <- 12
folder       <- str_glue("./Saved_objects/SEI3R_CIR/mdl_{model_id}")
mdl_filename <- str_glue("CIR_{model_id}")
pomp_obj     <- pomp_CIR(inc_mdl, mob_mdl, obs_df, mdl_filename, 1 / 128)
pomp_mdl     <- pomp_obj$mdl
par_obj      <- pomp_obj$pars
fixed_params <- par_obj$fixed
params       <- par_obj$all

mle_df       <- read_csv(file.path(folder, "mle.csv"))
```

```{r}
mle_vector <- deframe(mle_df[, c("name", "value")])
n_demo     <- 500

init_df <- data.frame(.id = 1:n_demo, week = 0, beta = mle_vector[["zeta"]])

set.seed(562808458)

pomp::simulate(pomp_mdl, params = c(par_obj$fixed, mle_vector),
                       nsim = n_demo) |>  as.data.frame() |> 
  select(.id, time, Z) |> 
  mutate(beta = Z * mle_vector[["zeta"]],
         week = time / 7) |>  
  select(-Z, - time) |>  
  bind_rows(init_df) |>  
  mutate(highlight = ifelse(.id == 100, TRUE, FALSE)) |>  
  arrange(.id, week) -> demo_CIR

g_demo_CIR <- plot_demo(demo_CIR, "B) Cox-Ingersoll-Ross (CIR)", CIR_colour)
```

```{r, fig.height = 7}
g_demo_GBM / g_demo_CIR

ggsave("./paper_plots/Fig_03_Simulations.pdf", 
       plot = g_demo_GBM / g_demo_CIR, height = 7, width = 5)

ggsave("./paper_plots/Fig_03_Simulations.eps", device = cairo_ps,
       plot = g_demo_GBM / g_demo_CIR, height = 7, width = 5)
```

\newpage

# Hidden states by DGP

_This section is added for reproducibility purposes._

```{r}

DGP1 <- readRDS("./Saved_objects/SEI3R_GBM/mdl_6/summary_C.rds") |> 
  mutate(DGP = "1")

DGP2 <- readRDS("./Saved_objects/SEI3R_CIR/mdl_12/summary_C.rds") |> 
  mutate(DGP = "2")

sim_C_df <- bind_rows(DGP1, DGP2) |> 
  mutate(week = time / 7)

SMTH_obj <- readRDS("./Saved_objects/SEI3R_SMTH/predictions.rds")

DGP3 <- SMTH_obj$sim_inc |> mutate(DGP = SMTH_obj$label)

sim_C_df <- bind_rows(sim_C_df, DGP3) 

data_df <- rename(data_list$Weekly, y = y1)
  
plot_fit_comparison(sim_C_df, data_df, "C[t]",
                    "A) Incidence fit per DGP", shape = 18) -> g1
```

```{r}
DGP1 <- readRDS("./Saved_objects/SEI3R_GBM/mdl_6/summary_Z.rds") |> 
  mutate(DGP = "1")

DGP2 <- readRDS("./Saved_objects/SEI3R_CIR/mdl_12/summary_Z.rds") |> 
  mutate(DGP = "2")

sim_Z_df <- bind_rows(DGP1, DGP2) |> 
  mutate(week = time / 7)

DGP3 <- SMTH_obj$sim_mob |> mutate(DGP = SMTH_obj$label) |> 
  filter(week >= 1 & week <= 11)

sim_Z_df <- bind_rows(sim_Z_df, DGP3)

data_df <- rename(data_list$Daily, y = y2) |> 
  mutate(week = time / 7) |>  
  filter(week >= 1 & week <= 11)

plot_fit_comparison(sim_Z_df, data_df, "Z[t]",
                    "B) Relative transmission rate fit per DGP", shape = 16) -> g2
```


```{r}
DGP1 <- readRDS("./Saved_objects/SEI3R_GBM/mdl_6/summary_Re.rds") |> 
  mutate(DGP = "1")

DGP2 <- readRDS("./Saved_objects/SEI3R_CIR/mdl_12/summary_Re.rds") |> 
  mutate(DGP = "2")

sim_Re_df <- bind_rows(DGP1, DGP2) |> 
  mutate(week = time / 7)

DGP3 <- SMTH_obj$Re_t|> mutate(DGP = SMTH_obj$label) |> 
  filter(week >= 1 & week <= 11)

sim_Re_df <- bind_rows(sim_Re_df, DGP3)

plot_re_by_DGP(sim_Re_df) -> g3
```


```{r, fig.height = 6, dev = 'cairo_pdf'}
g1 / g2 / g3 / guide_area() +
  plot_layout(guides = 'collect') -> g
print(g)
```


```{r}
fig_path <- "./paper_plots/Fig_07_Comparison.pdf"
ggsave(fig_path, g, height = 7, width = 5, device = cairo_pdf)

ggsave("./paper_plots/Fig_07_Comparison.eps", g, height = 7, width = 5, 
       device = cairo_ps)
```