add-new-model.qmd

## Adding a new model

If you have read [Section @sec-bmm-architecture] and [Section @sec-example-model], you should have a pretty good idea of how the `bmm` package functions. Now it's time to add your new model.

The good news are, you don't have to add any of the files manually. The `bmm` package includes a function `use_model_template()` that generates all the files with templates for all necessary functions. Thus, you can focus on the important work of filling these templates with the relevant information without worrying about missing something critical. If you type `?use_model_template` you will see the following description:

### Function `use_model_template()` {.unnumbered}

#### Description {.unnumbered}

Create a file with a template for adding a new model (for developers)

#### Usage {.unnumbered}

``` r
use_model_template(
  model_name,
  custom_family = FALSE,
  stanvar_blocks = c("data", "tdata", "parameters", 
                     "tparameters", "model", "likelihood",
                     "genquant", "functions"),
  open_files = TRUE,
  testing = FALSE
)
```

#### Arguments {.unnumbered}

| Argument         | Description                                                                                                                                                                                                                                                                                                                                                                                   |
|----------------------|--------------------------------------------------|
| `model_name`     | A string with the name of the model. The file will be named bmm_model_model_name.R and all necessary functions will be created with the appropriate names and structure. The file will be saved in the ⁠R/⁠ directory                                                                                                                                                                           |
| `custom_family`  | Logical; Do you plan to define a brms::custom_family()? If TRUE the function will add a section for the custom family, placeholders for the stan_vars and corresponding empty .stan files in ⁠inst/stan_chunks/⁠, that you can fill For an example, see the sdmSimple model in ⁠/R/bmm_model_sdmSimple.R⁠. If FALSE (default) the function will not add the custom family section nor stan files. |
| `stanvar_blocks` | A character vector with the names of the blocks that will be added to the custom family section. See brms::stanvar() for more details. The default lists all the possible blocks, but it is unlikely that you will need all of them. You can specify a vector of only those that you need. The function will add a section for each block in the list                                         |
| `open_files`     | Logical; If TRUE (default), the function will open the template files that were created in RStudio                                                                                                                                                                                                                                                                                            |
| `testing`        | Logical; If TRUE, the function will return the file content but will not save the file. If FALSE (default), the function will save the file                                                                                                                                                                                                                                                   |

## Example

Let's add a new model called `gcm`. Let's assume that you have tested the model in Stan and you have the Stan code ready. We want to define a custom family for the `gcm` model, and we want to define the following blocks: `likelihood`, `functions` (see `?brms::stanvar` for an explanation of the blocks).

First you set up your system and git environment as described in [Section @sec-setup]. Then you can run the following code from RStudio in the root directory of the `bmm` package:

``` r
use_model_template("gcm", custom_family = TRUE, stanvar_blocks = c("likelihood", "functions"))
```

This will create the file `bmm_model_gcm.R` in the `R/` directory and the files `gcm_likelihood.stan` and `gcm_functions.stan` in the `inst/stan_chunks/` directory. The function will also open the files in RStudio. You will see the following output in the console:

```         
• Modify 'inst/stan_chunks/gcm_likelihood.stan'
• Modify 'inst/stan_chunks/gcm_functions.stan'
• Modify 'R/model_gcm.R'
```

Now you can fill the files with the appropriate code. The Stan files will be empty, but the R file will have the following structure:

``` r
#############################################################################!
# MODELS                                                                 ####
#############################################################################!
# see file 'R/model_mixture3p.R' for an example

.model_gcm <- function(resp_var1 = NULL, required_arg1 = NULL, required_arg2 = NULL, links = NULL, version = NULL, call = NULL, ...) {
   out <- structure(
     list(
       resp_vars = nlist(resp_var1),
       other_vars = nlist(required_arg1, required_arg2),
       domain = '',
       task = '',
       name = '',
       citation = '',
       version = version,
       requirements = '',
       parameters = list(),
       links = list(),
       fixed_parameters = list(),
       default_priors = list(par1 = list(), par2 = list()),
       void_mu = FALSE
     ),
     class = c('bmmodel', 'gcm'),
     call = call
   )
   if(!is.null(version)) class(out) <- c(class(out), paste0("gcm_",version))
   out$links[names(links)] <- links
   out
}
# user facing alias
# information in the title and details sections will be filled in
# automatically based on the information in the .model_gcm()$info
 
#' @title `r .model_gcm()$name`
#' @name Model Name#' @details `r model_info(.model_gcm())`
#' @param resp_var1 A description of the response variable
#' @param required_arg1 A description of the required argument
#' @param required_arg2 A description of the required argument
#' @param links A list of links for the parameters.
#' @param version A character label for the version of the model. Can be empty or NULL if there is only one version. 
#' @param ... used internally for testing, ignore it
#' @return An object of class `bmmodel`
#' @export
#' @examples
#' \dontrun{
#' # put a full example here (see 'R/model_mixture3p.R' for an example)
#' }
gcm <- function(resp_var1, required_arg1, required_arg2, links = NULL, version = NULL, ...) {
   call <- match.call()
   stop_missing_args()
   .model_gcm(resp_var1 = resp_var1, required_arg1 = required_arg1, required_arg2 = required_arg2,
                links = links, version = version,call = call, ...)
}


#############################################################################!
# CHECK_DATA S3 methods                                                  ####
#############################################################################!
# A check_data.* function should be defined for each class of the model.
# If a model shares methods with other models, the shared methods should be
# defined in helpers-data.R. Put here only the methods that are specific to
# the model. See ?check_data for details.
# (YOU CAN DELETE THIS SECTION IF YOU DO NOT REQUIRE ADDITIONAL DATA CHECKS)

#' @export
check_data.gcm <- function(model, data, formula) {
   # retrieve required arguments
   required_arg1 <- model$other_vars$required_arg1
   required_arg2 <- model$other_vars$required_arg2

   # check the data (required)

   # compute any necessary transformations (optional)

   # save some variables as attributes of the data for later use (optional)

   NextMethod('check_data')
}


#############################################################################!
# Convert bmmformula to brmsformla methods                               ####
#############################################################################!
# A bmf2bf.* function should be defined if the default method for consructing
# the brmsformula from the bmmformula does not apply (e.g if aterms are required).
# The shared method for all `bmmodels` is defined in bmmformula.R.
# See ?bmf2bf for details.
# (YOU CAN DELETE THIS SECTION IF YOUR MODEL USES A STANDARD FORMULA WITH 1 RESPONSE VARIABLE)

#' @export
bmf2bf.gcm <- function(model, formula) {
   # retrieve required response arguments
   resp_var1 <- model$resp_vars$resp_var1
   resp_var2 <- model$resp_vars$resp_arg2

   # set the base brmsformula based 
   brms_formula <- brms::bf(paste0(resp_var1," | ", vreal(resp_var2), " ~ 1" ),)

   # return the brms_formula to add the remaining bmmformulas to it.
   brms_formula
}


#############################################################################!
# CONFIGURE_MODEL S3 METHODS                                             ####
#############################################################################!
# Each model should have a corresponding configure_model.* function. See
# ?configure_model for more information.

#' @export
configure_model.gcm <- function(model, data, formula) {
   # retrieve required arguments
   required_arg1 <- model$other_vars$required_arg1
   required_arg2 <- model$other_vars$required_arg2

   # retrieve arguments from the data check
   my_precomputed_var <- attr(data, 'my_precomputed_var')

   # construct brms formula from the bmm formula
   formula <- bmf2bf(model, formula)

   # construct the family & add to formula object 
   gcm_family <- brms::custom_family(
     'gcm',
     dpars = c(),
     links = c(),
     lb = c(), # upper bounds for parameters
     ub = c(), # lower bounds for parameters
     type = '', # real for continous dv, int for discrete dv
     loop = TRUE, # is the likelihood vectorized
   )
   formula$family <- gcm_family

   # prepare initial stanvars to pass to brms, model formula and priors
   sc_path <- system.file('stan_chunks', package='bmm')
   stan_likelihood <- read_lines2(paste0(sc_path, '/gcm_likelihood.stan'))
   stan_functions <- read_lines2(paste0(sc_path, '/gcm_functions.stan'))

   stanvars <- stanvar(scode = stan_likelihood, block = 'likelihood') +
      stanvar(scode = stan_functions, block = 'functions')

   # return the list
   nlist(formula, data, stanvars)
}


#############################################################################!
# POSTPROCESS METHODS                                                    ####
#############################################################################!
# A postprocess_brm.* function should be defined for the model class. See 
# ?postprocess_brm for details

#' @export
postprocess_brm.gcm <- function(model, fit) {
   # any required postprocessing (if none, delete this section)
   fit
}
```

Now you have to:

1.  Fill the `.model_gcm` function with the appropriate code. This function should return a list with all the variables specified above. The class of the list should be `c('bmmmodel', 'gcm')`. Rename the response arguments and the other required arguments, or delete the other arguments if you do not have any. You can see an example in the `model_sdm.R` file. Specify the parameters of the model, the link functions, what if any parameters are fixed (and to what value). It's crucial that you set default priors for every parameter of the model, which should be informed by knowledge in the field.

2.  Adjust the user-facing alias. Here you should only rename the required arguments and fill in the `@examples` section with a full example. Everything else will be filled in automatically based on the information in the `.model_gcm` function.

3.  Fill the `check_data.gcm` function with the appropriate code. This function should check the data and return the data. You may or may not need to compute any transformations or save some variables as attributes of the data.

4.  If necessary define the `bmf2bf.gcm` method to convert the `bmmformula` to a `brmsformula`. The first step for this is always to specify the response variable and additional response information. Keep in mind that `brms` automatically interprets this formula as the linear model formula for the `mu` parameter of your custom family. Currently, `brms` requires all custom families to have a `mu` parameter. However, we recommend to code this parameter as a `void_mu`, and fix the intercept of this parameter to zero using constant priors. This way, the `bmmformula` can be used to only specify the linear or non-linear model for the parameters of a `bmmmodel`. *If your model has a single response variable, you can delete this section.*

5.  Fill the `configure_model.gcm` function with the appropriate code. This function should construct the formula, the family, the stanvars. You can also retrieve any arguments you saved from the data check. Depending on your model, some of these parts might not be necessary. For example, for the mixture models (e.g. `mixture3p`), we construct a new formula, because we want to rename the arguments to make it easier for the user. For the `sdmSimple` model, we define the family ourselves, so we don't need to change the formula.

    You need to fill information about your custom family, and then fill the STAN files with your STAN code. Conveniently, you don't have to edit lines 134-140: loading the STAN files and setting up the stanvars is set up automatically when calling the `use_model_template` function. Should you need to add more STAN files after you created the template, you can add the files in `init/stan_chunks/` manually and edit those lines to additionally load the manually added files.

6.  Fill the `postprocess_brm.gcm` function with the appropriate code. By post-processing, we mean changes to the fitted brms model - like renaming parameters, etc. If you don't need any post-processing, you can delete this section.

## Testing

Unit testing is extremely important. You should test your model with the `testthat` package. You can use the `use_test()` function to create a test file for your model. See file `tests/testthat/test-bmm.R` for an example of how we test the existing models. BRMS models take a long time to fit, so we don't test the actual fitting process. The `bmm()` function provides an argument `backend="mock"`, which will return a mock object instead of fitting the model. This ensures that the entire pipeline works without errors. For example, here's a test of the `IMMfull` model:

``` r
test_that('Available mock models run without errors',{
  withr::local_options('bmm.silent'=2)
  skip_on_cran()
  dat <- data.frame(
    resp_err = rIMM(n = 5),
    Item2_rel = 2,
    Item3_rel = -1.5,
    spaD2 = 0.5,
    spaD3 = 2
  )

  # two-parameter model mock fit
  f <- bmmformula(kappa ~ 1, c ~ 1, a ~ 1, s ~ 1)
  mock_fit <- bmm(f, dat, 
                  imm(resp_err = "resp_err", 
                      setsize = 3, 
                      nt_features = paste0('Item',2:3,'_rel'), 
                      nt_distance=paste0('spaD',2:3)), 
                  backend = "mock", mock_fit = 1, rename=FALSE)
  expect_equal(mock_fit$fit, 1)
  expect_type(mock_fit$bmm$fit_args, "list")
  expect_equal(names(mock_fit$fit_args[1:4]), c("formula", "data"))
})
```

The tests based on the `testthat` package are run every time you call the `check()` command. Before you submit your changes, make sure that all tests pass.

::: callout-important
Additionally, you should perform a full test of the model by running it in a separate script and ensuring it gives meaningful results. At the very least, you should perform basic parameter recovery simulations for hyper-parameters (i.e. means and standard deviations) as well as subject-level parameters to give users an idea of how much data they need to adequately estimate the model.

We are in the process of establishing guidelines for that.
:::

## Add an example dataset

All new models should come with an example dataset, that can be loaded by users and can be used in the examples section. This should be either:

-   A new dataset that you add to the package
-   A dataset that already exists in the package but that can be used with the new model
-   A dataset that exists in another package that you can load with `data()` and use with the new model

For example, the vignettes for the `mixture2p` and `mixture3p` use an external dataset from the `mixtur` package that can be loaded with `data('bays2009_full', package='mixtur')`. The `IMM` models use a dataset included in the current package. For instructions on how to add a new dataset see [here](https://r-pkgs.org/data.html).

## Add an article

All new models should come with an article that explains some basic information about the model and how to estimate it with `bmm`. You can use the `use_article()` function to create a new article. See [here](https://r-pkgs.org/vignettes.html) for more information. The articles will be published automatically on the package website under "Articles" when the pull request is approved. You can browse the source code for the existing articles in the `vignettes/articles/` directory. You can see the published version of the existing vignettes [here](../articles/).

And that's it! You have added a new model to the `bmm` package. You can now submit your changes to the `bmm` package repository.