Skip to content

Latest commit

 

History

History
165 lines (111 loc) · 3.94 KB

README.md

File metadata and controls

165 lines (111 loc) · 3.94 KB

Travis build status

New version of the FisherEM package with the Bayesian Fisher EM implemented.

Installation

R Package installation

CRAN dependencies

FisherEM needs the following CRAN R packages, so check that they are are installed on your computer.

required_CRAN <- c("MASS", "elasticnet", "parallel", "ggplot2")
not_installed_CRAN <- setdiff(required_CRAN, rownames(installed.packages()))
if (length(not_installed_CRAN) > 0) install.packages(not_installed_CRAN)

Installing FisherEM

  • A planned submission on CRAN in October
  • For the development version, use the github install
devtools::install_github("FloFloB/FisherEM-1")

New features

Simulation function

We added the script to simulate and reproduce of the BFEM chapter, 3 simulations are available.

# Chang 1983 setting
n = 300
simu = simu_bfem(n, which = "Chang1983")

# Section 4.2: 
p = 50
noise = 1
simu = simu_bfem(n = 900, which = "section4.2", p = p, noise = noise)

# Section 4.3: 
snr = 3
simu = simu_bfem(n=900, which = "section4.3", snr = snr)

The Bayesian Fisher-EM algorithm

The function structure, arguments and output are similar to fem()and sfem().

Y = iris[,-5]
cl_true = iris[,5]
res.bfem = bfem(Y, K = 3, model="DB", init = 'kmeans', method = 'gs', nstart = 10)

print(fem.ari(res.bfem, cl_true))
## [1] 0.9602777

Visualisation

ggbound = plot(res.bfem, type = 'elbo')
ggbound

ggspace = plot(res.bfem, type = 'subspace')
ggspace

High-dimensional example

Simulate from the frequentist DLM model with K=3 clusters. The latent space is of dimension d=2, and the other (p-d) dimensions are centered Gaussians with variance noise.

simu = simu_bfem(n = 900, which = "section4.2", p = 50, noise = 1)
Y = simu$Y
cl_true = simu$cl_true

# plot true subspace in 2-d
df.true = data.frame(simu$X, Cluster = factor(simu$cls))
ggtrue = ggplot(df.true, aes(x = X1, y=X2, col=Cluster, shape=Cluster)) +
  geom_point(size = 2) +
  scale_color_brewer(palette="Set2")  # color-blind friendly palette
print(ggtrue)

And then cluster the data with the BFEM algorithm.

res.bfem = bfem(simu$Y, K=3, model = 'DB', nstart = 10, method="gs")

cat('Init ARI : ', aricode::ARI(simu$cls, max.col(res.bfem$Tinit)))
## Init ARI :  0.4052959
cat('Final ARI : ', aricode::ARI(simu$cls, res.bfem$cls))
## Final ARI :  0.9709887

plot(res.bfem, type = "subspace")

References