almost final prep for v0.8.0

biodiverse · Oct 19, 2024 · e7a4858 · e7a4858
1 parent 950e808
commit e7a4858
Show file tree

Hide file tree

Showing 54 changed files with 1,241 additions and 3,806 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,7 +1,7 @@
 # spOccupancy 0.8.0
 
-+ All model fitting functions have a new function `parallel.chains` that allows chains to be run in parallel. If set to `TRUE`, the `n.chains` will be run in parallel. Note that this is different from `n.omp.threads`, which is used for *within-chain* parallelization for spatial models (`n.omp.threads` does not do anything for non-spatial models). Note that I do not recommend using both `parallel.chains` and `n.omp.threads` if fitting a spatial model, as it will actually result in substantial slowing of the model relative to even a model with no paralelization. For the vast majority of users, using `parallel.chains` will be fastest. Generally, using `parallel.chains` will be faster than using `n.omp.threads` for spatial models, however for very large data sets (e.g., tens of thousands of locations) `n.omp.threads` may be faster. The truly fastest way to run a spatial model in `spOccupancy` is to separately run chains in different R sessions, where each chain uses `n.omp.threads` to implement within-chain parallelization. Note that results for a model won't be exactly the same if it's run in parallel vs. sequence as a result of (1) random seeds being different and (2) the sequential runs using the previous tuning value as the starting tuning value for the subsequent run of the model. So, in theory if you give really bad tuning variances to start off, the sequential model may lead to slightly faster convergence (in terms of number of iterations needed and not actual time). Note that generally using `parallel.chains` does not result in computational improvements when using a full Gaussian process, but it will give substantial improvements with all other models (NNGP models or non-spatial models).
-+ New functionality for fitting multi-season, single-species integrated occupancy models. The function `tIntPGOcc()` fits a non-spatial multi-season integrated occupancy model, `stIntPGOcc()` fits a spatial multi-season integrated occupancy model, and `svcTIntPGOcc()` fits a spatially-varying coefficient multi-season occupancy model. Random intercepts are supported in both the occurrence and detection formulas for both model types.  
++ For queries on anything related to `spOccupancy` (and `spAbundance`), please use the new [spOccupancy/spAbundance mailing list](https://groups.google.com/g/spocc-spabund-users). 
++ New functionality for fitting multi-season, single-species integrated occupancy models. The function `tIntPGOcc()` fits a non-spatial multi-season integrated occupancy model, `stIntPGOcc()` fits a spatial multi-season integrated occupancy model, and `svcTIntPGOcc()` fits a spatially-varying coefficient multi-season occupancy model. Random intercepts are supported in both the occurrence and detection formulas for both model types. I am behind on adding vignettes for some of the newer functionality (sorry!), but adding a vignette for this new functionality is on my todo list. If interested in using these functions and you're having problems fitting them, please send your questions to the mailing list. 
 + Added in functionality for both occupancy and detection random intercepts in single-species single-season integrated models (`intPGOcc()` and `spIntPGOcc()`) using `lme4` syntax (e.g., `(1 | observer)` for a random effect of observer).
 + `simTIntPGOcc()` is a new function that allows simulation of single-species multi-season detection-nondetection data from multiple data sources.
 + Updated `simMsIntPGOcc()` to now include simulation of data sets with spatially-varying coefficients and unstructured random effects on both occurrence and detection.
@@ -43,7 +43,7 @@
 # spOccupancy 0.7.2
 
 + Added in functionality for using the `plot()` function to generate simple traceplots using `spOccupancy` model objects. Details can be found in the help page (e.g., for `spPGOcc` models, type `?plot.spPGOcc` in the console).  
-+ Not an update to the package, but a [new vignette](https://www.jeffdoser.com/files/spoccupancy-web/articles/identifiability) has been posted on testing model identifiability using `spOccupancy`. Thanks to Sara Stoudt for writing this!
++ Not an update to the package, but a [new vignette](https://doserlab.com/files/spoccupancy-web/articles/identifiability) has been posted on testing model identifiability using `spOccupancy`. Thanks to Sara Stoudt for writing this!
 + Added in the ability to fit `lfJSDM()` without residual species correlations by setting `n.factors = 0`. This is a model analogous to `msPGOcc()`, but without the detection component. 
 + Added in the `shared.spatial` argument to `sfJSDM()`. If set to `TRUE`, this argument estimates a common spatial process for all species instead of using the default spatial factor modeling approach. 
 + Fixed a bug in `predict.svcTMsPGOcc()` when same variable was used for a fixed and random effect (e.g., if including a linear year trend and also an unstructured random intercept for year). Thanks to Liam Kendall for pointing this out.  
@@ -78,7 +78,7 @@ spOccupancy v0.7.0 contains a variety of substantial updates, most notably funct
 + Updated `getSVCSamples()` to eliminate errors that prevented the function from working under certain circumstances depending on which covariates in the design matrix were modelled as spatially-varying coefficients.
 + Updated `tPGOcc()` and `stPGOcc()` to eliminate an error that occurred when trying to run these models with single-visit data sets.
 + Added in the `mis.spec.type` and `scale.param` arguments to the `simTOcc()` function to simulate multi-season detection-nondetection data under varying forms of model mis-specification. See `simTOcc()` documentation for detials. Thanks to Sara Stoudt for her help with this. 
-+ Updated a typo in the MCMC sampler documentation for multi-species occupancy models. Specifically, the **T** in the mean component of Equations 23 and 24 from the [MCMC samplers vignette](https://www.jeffdoser.com/files/spoccupancy-web/articles/mcmcSamplers.pdf) was incorrect, and instead is now correctly **T**$^{-1}$. Similarly, Equations 9 and 10 were updated in the [MCMC samplers for factor models vignette](https://www.jeffdoser.com/files/spoccupancy-web/articles/mcmcFactorModels.pdf). Note that these were just typos in the vignettes, the underlying models are correct.
++ Updated a typo in the MCMC sampler documentation for multi-species occupancy models. Specifically, the **T** in the mean component of Equations 23 and 24 from the [MCMC samplers vignette](https://doserlab.com/files/spoccupancy-web/articles/mcmcSamplers.pdf) was incorrect, and instead is now correctly **T**$^{-1}$. Similarly, Equations 9 and 10 were updated in the [MCMC samplers for factor models vignette](https://doserlab.com/files/spoccupancy-web/articles/mcmcFactorModels.pdf). Note that these were just typos in the vignettes, the underlying models are correct.
 + Fixed a bug that prevented `ppcOcc()` from working when there were only site-level random effects on detection. This also sometimes caused problems with cross-validation functionality as well. Thanks to Jose Luis Mena for bringing this to my attention. 
 
 # spOccupancy 0.5.2

diff --git a/R/PGOcc.R b/R/PGOcc.R
@@ -1,7 +1,7 @@
 PGOcc <- function(occ.formula, det.formula, data, inits, priors, 
                   n.samples, n.omp.threads = 1, verbose = TRUE,
                   n.report = 100, n.burn = round(.10 * n.samples), n.thin = 1, 
-                  n.chains = 1, parallel.chains = FALSE, k.fold, k.fold.threads = 1, 
+                  n.chains = 1, k.fold, k.fold.threads = 1, 
                   k.fold.seed = 100, k.fold.only = FALSE, ...){
 
   ptm <- proc.time()
@@ -652,99 +652,32 @@ if (length(sigma.sq.p.inits) != p.det.re) {
   out.tmp <- list()
   out <- list()
   if (!k.fold.only) {
-    if (parallel.chains) {
-      if (verbose) {
-        cat("----------------------------------------\n");
-        cat("\tRunning the model\n");
-        cat("----------------------------------------\n");
-        message("MCMC chains are running in parallel. Model progress output is suppressed.")
-      }
-      beta.inits.list <- list()
-      beta.inits.list[[1]] <- beta.inits
-      alpha.inits.list <- list()
-      alpha.inits.list[[1]] <- alpha.inits
-      sigma.sq.psi.inits.list <- list()
-      sigma.sq.psi.inits.list[[1]] <- sigma.sq.psi.inits
-      beta.star.inits.list <- list()
-      beta.star.inits.list[[1]] <- beta.star.inits
-      sigma.sq.p.inits.list <- list()
-      sigma.sq.p.inits.list[[1]] <- sigma.sq.p.inits
-      alpha.star.inits.list <- list()
-      alpha.star.inits.list[[1]] <- alpha.star.inits
-      if (!fix.inits) {
-        if (n.chains > 1) {
-          for (i in 2:n.chains) {
-            beta.inits.list[[i]] <- rnorm(p.occ, mu.beta, sqrt(sigma.beta))
-            alpha.inits.list[[i]] <- rnorm(p.det, mu.alpha, sqrt(sigma.alpha))
-            if (p.occ.re > 0) {
-              sigma.sq.psi.inits.list[[i]] <- runif(p.occ.re, 0.5, 10)
-              beta.star.inits.list[[i]] <- rnorm(n.occ.re, 0, sqrt(sigma.sq.psi.inits[beta.star.indx + 1]))
-            } else {
-              sigma.sq.psi.inits.list[[i]] <- 1
-              beta.star.inits.list[[i]] <- 1
-            }
-            if (p.det.re > 0) {
-              sigma.sq.p.inits.list[[i]] <- runif(p.det.re, 0.5, 10)
-              alpha.star.inits.list[[i]] <- rnorm(n.det.re, 0, sqrt(sigma.sq.p.inits[alpha.star.indx + 1]))
-            } else {
-              sigma.sq.p.inits.list[[i]] <- 1
-              alpha.star.inits.list[[i]] <- 1
-            }
-          }
+    for (i in 1:n.chains) {
+      # Change initial values if i > 1
+      if ((i > 1) & (!fix.inits)) {
+        beta.inits <- rnorm(p.occ, mu.beta, sqrt(sigma.beta))
+        alpha.inits <- rnorm(p.det, mu.alpha, sqrt(sigma.alpha))
+        if (p.occ.re > 0) {
+          sigma.sq.psi.inits <- runif(p.occ.re, 0.5, 10)
+          beta.star.inits <- rnorm(n.occ.re, 0, sqrt(sigma.sq.psi.inits[beta.star.indx + 1]))
         }
-      } else {
-        if (n.chains > 1) {
-          for (i in 2:n.chains) {
-            beta.inits.list[[i]] <- beta.inits.list[[1]]
-            alpha.inits.list[[i]] <- alpha.inits.list[[1]]
-            sigma.sq.psi.inits.list[[i]] <- sigma.sq.psi.inits.list[[1]]
-            beta.star.inits.list[[i]] <- beta.star.inits.list[[1]]
-            sigma.sq.p.inits.list[[i]] <- sigma.sq.p.inits.list[[1]]
-            alpha.star.inits.list[[i]] <- alpha.star.inits.list[[1]]
-          }
+        if (p.det.re > 0) {
+          sigma.sq.p.inits <- runif(p.det.re, 0.5, 10)
+          alpha.star.inits <- rnorm(n.det.re, 0, sqrt(sigma.sq.p.inits[alpha.star.indx + 1]))
         }
       }
-      par.cl <- parallel::makePSOCKcluster(n.chains)
-      registerDoParallel(par.cl)
-      out.tmp <- foreach(i = 1:n.chains) %dorng% {
-        .Call("PGOcc", y, X, X.p, X.re, X.p.re, consts, 
-              K, n.occ.re.long, n.det.re.long, beta.inits.list[[i]], alpha.inits.list[[i]], 
-              sigma.sq.psi.inits.list[[i]], sigma.sq.p.inits.list[[i]], beta.star.inits.list[[i]], 
-              alpha.star.inits.list[[i]], z.inits, z.long.indx, beta.star.indx, 
-              beta.level.indx, alpha.star.indx, alpha.level.indx, mu.beta, 
-              mu.alpha, Sigma.beta, Sigma.alpha, sigma.sq.psi.a, sigma.sq.psi.b, 
-              sigma.sq.p.a, sigma.sq.p.b, n.samples, n.omp.threads, verbose, 
-              n.report, samples.info, chain.info)
-      }
-      parallel::stopCluster(par.cl)
-    } else {
-      for (i in 1:n.chains) {
-        # Change initial values if i > 1
-        if ((i > 1) & (!fix.inits)) {
-          beta.inits <- rnorm(p.occ, mu.beta, sqrt(sigma.beta))
-          alpha.inits <- rnorm(p.det, mu.alpha, sqrt(sigma.alpha))
-          if (p.occ.re > 0) {
-            sigma.sq.psi.inits <- runif(p.occ.re, 0.5, 10)
-            beta.star.inits <- rnorm(n.occ.re, 0, sqrt(sigma.sq.psi.inits[beta.star.indx + 1]))
-          }
-          if (p.det.re > 0) {
-            sigma.sq.p.inits <- runif(p.det.re, 0.5, 10)
-            alpha.star.inits <- rnorm(n.det.re, 0, sqrt(sigma.sq.p.inits[alpha.star.indx + 1]))
-          }
-        }
-        storage.mode(chain.info) <- "integer"
-        # Run the model in C
-        out.tmp[[i]] <- .Call("PGOcc", y, X, X.p, X.re, X.p.re, consts, 
-          		    K, n.occ.re.long, n.det.re.long, beta.inits, alpha.inits, 
-          		    sigma.sq.psi.inits, sigma.sq.p.inits, beta.star.inits, 
-          		    alpha.star.inits, z.inits, z.long.indx, beta.star.indx, 
-          		    beta.level.indx, alpha.star.indx, alpha.level.indx, mu.beta, 
-          		    mu.alpha, Sigma.beta, Sigma.alpha, sigma.sq.psi.a, sigma.sq.psi.b, 
-          		    sigma.sq.p.a, sigma.sq.p.b, n.samples, n.omp.threads, verbose, 
-          		    n.report, samples.info, chain.info)
-        chain.info[1] <- chain.info[1] + 1
-      } # i   
-    }
+      storage.mode(chain.info) <- "integer"
+      # Run the model in C
+      out.tmp[[i]] <- .Call("PGOcc", y, X, X.p, X.re, X.p.re, consts, 
+        		    K, n.occ.re.long, n.det.re.long, beta.inits, alpha.inits, 
+        		    sigma.sq.psi.inits, sigma.sq.p.inits, beta.star.inits, 
+        		    alpha.star.inits, z.inits, z.long.indx, beta.star.indx, 
+        		    beta.level.indx, alpha.star.indx, alpha.level.indx, mu.beta, 
+        		    mu.alpha, Sigma.beta, Sigma.alpha, sigma.sq.psi.a, sigma.sq.psi.b, 
+        		    sigma.sq.p.a, sigma.sq.p.b, n.samples, n.omp.threads, verbose, 
+        		    n.report, samples.info, chain.info)
+      chain.info[1] <- chain.info[1] + 1
+    } # i   
     # Calculate R-Hat ---------------
     out <- list()
     out$rhat <- list()