s-example-MMDai.Rmd

--- 
title: "Simulation example 1" 
author: "Stefano Cabras - UNICA / UC3M" 
date: "10/12/2019" 
output: md_document 
bibliography: biblio-msdpmp.bib 
--- 

  
```{r setup, include=FALSE} 
knitr::opts_chunk$set(echo = TRUE) 
``` 


# Set up 

```{r} 
rm(list=ls()) 
set.seed(11) 
library(BayesVarSel) 
library(mixdir) 
p=200 
n=50 
nsim.gibbs=1000
m1=3 
signaltonoise=3 
``` 

Suppose we have `r p` covariates and `r n` iid observations from a regression model where intercept and `r p-m1` coefficients are zero while the first `r m1` covariates are `r signaltonoise` and errors with variance 1. 

```{r} 
x=cbind(1,array(rnorm(n*p,0,1),dim=c(n,p))) 
betas=c(0,c(1,1,1)*signaltonoise,rep(0,p-3)) 
colnames(x)=c("I",paste("x",1:p,sep="")) 
true.mod=(betas>0)*1 
names(true.mod)=colnames(x) 
y=x%*%betas+rnorm(n,0,1) 
simdat=data.frame(y,x) 
``` 


# Usual analysis with Gibbs sampling on model space. 

Suppose to explore the model space with `r nsim.gibbs` steps using Gibbs Sampling, starting from the full and assuming a constant prior on model space (only to perform Gibb Sampling). BF factors are calculated using the conventional prior. 

```{r,cache=TRUE} 
res.GibbsBvs<- GibbsBvs(formula= y ~ ., data=simdat, prior.betas="gZellner", 
                        prior.models="Constant", n.iter=nsim.gibbs, init.model="Null", n.burnin=100, 
                        time.test = FALSE)
``` 


From exploration steps we need at least the set of visited models in all performed Gibbs steps (BFs are ignored). These are our observations.


```{r} 
xx=res.GibbsBvs$modelslogBF 
dim(xx) 
ww=xx[,ncol(xx)] 
xx=xx[,-ncol(xx)] 
image(xx[,1:10]) 
``` 

This matrix is the of 0/1s that provides evidence for the true model into a part of the model space. In this case the explored models have the following sizes:

```{r}
barplot(table(rowSums(xx)),xlab="Model sizes",las=2)
```


# A refinement using DPMP with MMDai
Not all models into the model space can be visited, only those with $p-n-1$ covariates (`r p-n-1` in this case). The full model space is given by the hyper contingency table made by crossing $p$ 0/1 categorical variables with 2 levels and thus by $2^p$ cells (`r 2^p` in this case). On the probability distribution of such contingency table we assume a Dirichlet process prior (which is the prior on model space) and the data are the samples from Gibbs step. This model is proposed here @Dunson2009. 

```{r} 
mdat=data.frame(xx)+1
``` 

Posterior of cells probabilities (and thus models) is obtained using the variational algorithm detailed in @Ahlmann-Eltze2018 (otherwise Gibbs sampling would be used as originally proposed in @Dunson2009). We assume that the latent space has dimension 2: the space with covariates with cells with high probabilities (in which it is supposed to lie the true model) and the set of cells with low probabilities (in which there is not the true model) 

```{r,cache=TRUE}
library(MMDai)
res <- ParEst(mdat,d=rep(2,ncol(mdat)),k=2)

``` 

Given one of the latent spaces, each covariate has a probability to be 1 or 0 (i.e. included or not included in the model). Let's analyse each latent space separately: 

The posterior probability of each covariate for the first latent space is (first `r m1` covariates belong to this latent space with high probability): 

```{r} 
pls1=unlist(unlist(lapply(res$psi,function(x) x[2,2]))) 
barplot(pls1[-1],col=c(rep(2,m1),rep(1,p-m1)),las=2,cex.axis=0.5,ylim=c(0,1)) 
``` 

  
The posterior probability of each covariate for the second latent space (which represents the null model): 

  
```{r} 
pls2=unlist(unlist(lapply(res$psi,function(x) x[1,2]))) 
barplot(pls2[-1],col=c(rep(2,m1),rep(1,p-m1)),las=2,cex.axis=0.5,ylim=c(0,1)) 
``` 

  
# Comparison among the two analysis 

  
Let's compare the posterior inclusion probability with just Gibbs sampling with the posterior probability using the DPMP: 

```{r} 
exdetail=paste("nGiibsBVS=",nsim.gibbs,
               " signaltonoise=",signaltonoise,sep="") 
par(mfrow=c(2,1)) 
barplot(pls1,col=true.mod+1, 
        main=exdetail, 
        ylab="Post. Prob. DMMP",ylim=c(0,1)) 
barplot(res.GibbsBvs$inclprob,col=true.mod+1,
        ylab="Post. Inclusion Prob.",ylim=c(0,1)) 
``` 

  
```{r} 
par(mfrow=c(1,1)) 
plot(res.GibbsBvs$inclprob,pls1,col=true.mod+1,cex=1,pch=19, 
     xlab="Post. Inclusion Prob.",ylab="Post. Prob. DMMP",main=exdetail,ylim=c(0,1),xlim=c(0,1)) 
points(res.GibbsBvs$inclprob[true.mod==1],pls1[true.mod==1],col=2,cex=1,pch=23) 
abline(0,1,lty=2) 
``` 

# References