-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathadd-new-model.qmd
285 lines (220 loc) · 17.1 KB
/
add-new-model.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
## Adding a new model
If you have read [Section @sec-bmm-architecture] and [Section @sec-example-model], you should have a pretty good idea of how the `bmm` package functions. Now it's time to add your new model.
The good news are, you don't have to add any of the files manually. The `bmm` package includes a function `use_model_template()` that generates all the files with templates for all necessary functions. Thus, you can focus on the important work of filling these templates with the relevant information without worrying about missing something critical. If you type `?use_model_template` you will see the following description:
### Function `use_model_template()` {.unnumbered}
#### Description {.unnumbered}
Create a file with a template for adding a new model (for developers)
#### Usage {.unnumbered}
``` r
use_model_template(
model_name,
custom_family = FALSE,
stanvar_blocks = c("data", "tdata", "parameters",
"tparameters", "model", "likelihood",
"genquant", "functions"),
open_files = TRUE,
testing = FALSE
)
```
#### Arguments {.unnumbered}
| Argument | Description |
|----------------------|--------------------------------------------------|
| `model_name` | A string with the name of the model. The file will be named bmm_model_model_name.R and all necessary functions will be created with the appropriate names and structure. The file will be saved in the R/ directory |
| `custom_family` | Logical; Do you plan to define a brms::custom_family()? If TRUE the function will add a section for the custom family, placeholders for the stan_vars and corresponding empty .stan files in inst/stan_chunks/, that you can fill For an example, see the sdmSimple model in /R/bmm_model_sdmSimple.R. If FALSE (default) the function will not add the custom family section nor stan files. |
| `stanvar_blocks` | A character vector with the names of the blocks that will be added to the custom family section. See brms::stanvar() for more details. The default lists all the possible blocks, but it is unlikely that you will need all of them. You can specify a vector of only those that you need. The function will add a section for each block in the list |
| `open_files` | Logical; If TRUE (default), the function will open the template files that were created in RStudio |
| `testing` | Logical; If TRUE, the function will return the file content but will not save the file. If FALSE (default), the function will save the file |
## Example
Let's add a new model called `gcm`. Let's assume that you have tested the model in Stan and you have the Stan code ready. We want to define a custom family for the `gcm` model, and we want to define the following blocks: `likelihood`, `functions` (see `?brms::stanvar` for an explanation of the blocks).
First you set up your system and git environment as described in [Section @sec-setup]. Then you can run the following code from RStudio in the root directory of the `bmm` package:
``` r
use_model_template("gcm", custom_family = TRUE, stanvar_blocks = c("likelihood", "functions"))
```
This will create the file `bmm_model_gcm.R` in the `R/` directory and the files `gcm_likelihood.stan` and `gcm_functions.stan` in the `inst/stan_chunks/` directory. The function will also open the files in RStudio. You will see the following output in the console:
```
• Modify 'inst/stan_chunks/gcm_likelihood.stan'
• Modify 'inst/stan_chunks/gcm_functions.stan'
• Modify 'R/model_gcm.R'
```
Now you can fill the files with the appropriate code. The Stan files will be empty, but the R file will have the following structure:
``` r
#############################################################################!
# MODELS ####
#############################################################################!
# see file 'R/model_mixture3p.R' for an example
.model_gcm <- function(resp_var1 = NULL, required_arg1 = NULL, required_arg2 = NULL, links = NULL, version = NULL, call = NULL, ...) {
out <- structure(
list(
resp_vars = nlist(resp_var1),
other_vars = nlist(required_arg1, required_arg2),
domain = '',
task = '',
name = '',
citation = '',
version = version,
requirements = '',
parameters = list(),
links = list(),
fixed_parameters = list(),
default_priors = list(par1 = list(), par2 = list()),
void_mu = FALSE
),
class = c('bmmodel', 'gcm'),
call = call
)
if(!is.null(version)) class(out) <- c(class(out), paste0("gcm_",version))
out$links[names(links)] <- links
out
}
# user facing alias
# information in the title and details sections will be filled in
# automatically based on the information in the .model_gcm()$info
#' @title `r .model_gcm()$name`
#' @name Model Name#' @details `r model_info(.model_gcm())`
#' @param resp_var1 A description of the response variable
#' @param required_arg1 A description of the required argument
#' @param required_arg2 A description of the required argument
#' @param links A list of links for the parameters.
#' @param version A character label for the version of the model. Can be empty or NULL if there is only one version.
#' @param ... used internally for testing, ignore it
#' @return An object of class `bmmodel`
#' @export
#' @examples
#' \dontrun{
#' # put a full example here (see 'R/model_mixture3p.R' for an example)
#' }
gcm <- function(resp_var1, required_arg1, required_arg2, links = NULL, version = NULL, ...) {
call <- match.call()
stop_missing_args()
.model_gcm(resp_var1 = resp_var1, required_arg1 = required_arg1, required_arg2 = required_arg2,
links = links, version = version,call = call, ...)
}
#############################################################################!
# CHECK_DATA S3 methods ####
#############################################################################!
# A check_data.* function should be defined for each class of the model.
# If a model shares methods with other models, the shared methods should be
# defined in helpers-data.R. Put here only the methods that are specific to
# the model. See ?check_data for details.
# (YOU CAN DELETE THIS SECTION IF YOU DO NOT REQUIRE ADDITIONAL DATA CHECKS)
#' @export
check_data.gcm <- function(model, data, formula) {
# retrieve required arguments
required_arg1 <- model$other_vars$required_arg1
required_arg2 <- model$other_vars$required_arg2
# check the data (required)
# compute any necessary transformations (optional)
# save some variables as attributes of the data for later use (optional)
NextMethod('check_data')
}
#############################################################################!
# Convert bmmformula to brmsformla methods ####
#############################################################################!
# A bmf2bf.* function should be defined if the default method for consructing
# the brmsformula from the bmmformula does not apply (e.g if aterms are required).
# The shared method for all `bmmodels` is defined in bmmformula.R.
# See ?bmf2bf for details.
# (YOU CAN DELETE THIS SECTION IF YOUR MODEL USES A STANDARD FORMULA WITH 1 RESPONSE VARIABLE)
#' @export
bmf2bf.gcm <- function(model, formula) {
# retrieve required response arguments
resp_var1 <- model$resp_vars$resp_var1
resp_var2 <- model$resp_vars$resp_arg2
# set the base brmsformula based
brms_formula <- brms::bf(paste0(resp_var1," | ", vreal(resp_var2), " ~ 1" ),)
# return the brms_formula to add the remaining bmmformulas to it.
brms_formula
}
#############################################################################!
# CONFIGURE_MODEL S3 METHODS ####
#############################################################################!
# Each model should have a corresponding configure_model.* function. See
# ?configure_model for more information.
#' @export
configure_model.gcm <- function(model, data, formula) {
# retrieve required arguments
required_arg1 <- model$other_vars$required_arg1
required_arg2 <- model$other_vars$required_arg2
# retrieve arguments from the data check
my_precomputed_var <- attr(data, 'my_precomputed_var')
# construct brms formula from the bmm formula
formula <- bmf2bf(model, formula)
# construct the family & add to formula object
gcm_family <- brms::custom_family(
'gcm',
dpars = c(),
links = c(),
lb = c(), # upper bounds for parameters
ub = c(), # lower bounds for parameters
type = '', # real for continous dv, int for discrete dv
loop = TRUE, # is the likelihood vectorized
)
formula$family <- gcm_family
# prepare initial stanvars to pass to brms, model formula and priors
sc_path <- system.file('stan_chunks', package='bmm')
stan_likelihood <- read_lines2(paste0(sc_path, '/gcm_likelihood.stan'))
stan_functions <- read_lines2(paste0(sc_path, '/gcm_functions.stan'))
stanvars <- stanvar(scode = stan_likelihood, block = 'likelihood') +
stanvar(scode = stan_functions, block = 'functions')
# return the list
nlist(formula, data, stanvars)
}
#############################################################################!
# POSTPROCESS METHODS ####
#############################################################################!
# A postprocess_brm.* function should be defined for the model class. See
# ?postprocess_brm for details
#' @export
postprocess_brm.gcm <- function(model, fit) {
# any required postprocessing (if none, delete this section)
fit
}
```
Now you have to:
1. Fill the `.model_gcm` function with the appropriate code. This function should return a list with all the variables specified above. The class of the list should be `c('bmmmodel', 'gcm')`. Rename the response arguments and the other required arguments, or delete the other arguments if you do not have any. You can see an example in the `model_sdm.R` file. Specify the parameters of the model, the link functions, what if any parameters are fixed (and to what value). It's crucial that you set default priors for every parameter of the model, which should be informed by knowledge in the field.
2. Adjust the user-facing alias. Here you should only rename the required arguments and fill in the `@examples` section with a full example. Everything else will be filled in automatically based on the information in the `.model_gcm` function.
3. Fill the `check_data.gcm` function with the appropriate code. This function should check the data and return the data. You may or may not need to compute any transformations or save some variables as attributes of the data.
4. If necessary define the `bmf2bf.gcm` method to convert the `bmmformula` to a `brmsformula`. The first step for this is always to specify the response variable and additional response information. Keep in mind that `brms` automatically interprets this formula as the linear model formula for the `mu` parameter of your custom family. Currently, `brms` requires all custom families to have a `mu` parameter. However, we recommend to code this parameter as a `void_mu`, and fix the intercept of this parameter to zero using constant priors. This way, the `bmmformula` can be used to only specify the linear or non-linear model for the parameters of a `bmmmodel`. *If your model has a single response variable, you can delete this section.*
5. Fill the `configure_model.gcm` function with the appropriate code. This function should construct the formula, the family, the stanvars. You can also retrieve any arguments you saved from the data check. Depending on your model, some of these parts might not be necessary. For example, for the mixture models (e.g. `mixture3p`), we construct a new formula, because we want to rename the arguments to make it easier for the user. For the `sdmSimple` model, we define the family ourselves, so we don't need to change the formula.
You need to fill information about your custom family, and then fill the STAN files with your STAN code. Conveniently, you don't have to edit lines 134-140: loading the STAN files and setting up the stanvars is set up automatically when calling the `use_model_template` function. Should you need to add more STAN files after you created the template, you can add the files in `init/stan_chunks/` manually and edit those lines to additionally load the manually added files.
6. Fill the `postprocess_brm.gcm` function with the appropriate code. By post-processing, we mean changes to the fitted brms model - like renaming parameters, etc. If you don't need any post-processing, you can delete this section.
## Testing
Unit testing is extremely important. You should test your model with the `testthat` package. You can use the `use_test()` function to create a test file for your model. See file `tests/testthat/test-bmm.R` for an example of how we test the existing models. BRMS models take a long time to fit, so we don't test the actual fitting process. The `bmm()` function provides an argument `backend="mock"`, which will return a mock object instead of fitting the model. This ensures that the entire pipeline works without errors. For example, here's a test of the `IMMfull` model:
``` r
test_that('Available mock models run without errors',{
withr::local_options('bmm.silent'=2)
skip_on_cran()
dat <- data.frame(
resp_err = rIMM(n = 5),
Item2_rel = 2,
Item3_rel = -1.5,
spaD2 = 0.5,
spaD3 = 2
)
# two-parameter model mock fit
f <- bmmformula(kappa ~ 1, c ~ 1, a ~ 1, s ~ 1)
mock_fit <- bmm(f, dat,
imm(resp_err = "resp_err",
setsize = 3,
nt_features = paste0('Item',2:3,'_rel'),
nt_distance=paste0('spaD',2:3)),
backend = "mock", mock_fit = 1, rename=FALSE)
expect_equal(mock_fit$fit, 1)
expect_type(mock_fit$bmm$fit_args, "list")
expect_equal(names(mock_fit$fit_args[1:4]), c("formula", "data"))
})
```
The tests based on the `testthat` package are run every time you call the `check()` command. Before you submit your changes, make sure that all tests pass.
::: callout-important
Additionally, you should perform a full test of the model by running it in a separate script and ensuring it gives meaningful results. At the very least, you should perform basic parameter recovery simulations for hyper-parameters (i.e. means and standard deviations) as well as subject-level parameters to give users an idea of how much data they need to adequately estimate the model.
We are in the process of establishing guidelines for that.
:::
## Add an example dataset
All new models should come with an example dataset, that can be loaded by users and can be used in the examples section. This should be either:
- A new dataset that you add to the package
- A dataset that already exists in the package but that can be used with the new model
- A dataset that exists in another package that you can load with `data()` and use with the new model
For example, the vignettes for the `mixture2p` and `mixture3p` use an external dataset from the `mixtur` package that can be loaded with `data('bays2009_full', package='mixtur')`. The `IMM` models use a dataset included in the current package. For instructions on how to add a new dataset see [here](https://r-pkgs.org/data.html).
## Add an article
All new models should come with an article that explains some basic information about the model and how to estimate it with `bmm`. You can use the `use_article()` function to create a new article. See [here](https://r-pkgs.org/vignettes.html) for more information. The articles will be published automatically on the package website under "Articles" when the pull request is approved. You can browse the source code for the existing articles in the `vignettes/articles/` directory. You can see the published version of the existing vignettes [here](../articles/).
And that's it! You have added a new model to the `bmm` package. You can now submit your changes to the `bmm` package repository.