Skip to content

Commit b5807b1

Browse files
authored
Merge pull request #265 from elfi-dev/dev
Release 0.7.1
2 parents 1476806 + 79c7e9c commit b5807b1

15 files changed

+188
-55
lines changed

.travis.yml

+4-3
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,10 @@ matrix:
1212
- mkdir -p /Users/travis/.matplotlib
1313
- "echo 'backend: TkAgg' > /Users/travis/.matplotlib/matplotlibrc"
1414
- brew update
15-
- brew install python3
16-
- virtualenv env -p python3
17-
- source env/bin/activate
15+
- brew upgrade python
16+
- pip3 install virtualenv
17+
- virtualenv py3env -p python3
18+
- source py3env/bin/activate
1819

1920
cache: pip
2021

CHANGELOG.rst

+9-1
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,17 @@
11
Changelog
22
=========
33

4+
0.7.1 (2018-04-11)
5+
------------------
6+
- Implemented model selection (elfi.compare_models). See API documentation.
7+
- Fix threshold=0 in rejection sampling
8+
- Set default batch_size to 1 in ParameterInference base class
9+
410
0.7 (2017-11-30)
511
----------------
6-
12+
- Added new example: the stochastic Lotka-Volterra model
13+
- Fix methods.bo.utils.minimize to be strictly within bounds
14+
- Implemented the Two Stage Procedure, a method of summary-statistics diagnostics
715
- Added the MaxVar acquisition method
816
- Added the RandMaxVar acquisition method
917
- Added the ExpIntVar acquisition method

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
**Version 0.7 released!** See the CHANGELOG and [notebooks](https://github.com/elfi-dev/notebooks).
1+
**Version 0.7.1 released!** See the CHANGELOG and [notebooks](https://github.com/elfi-dev/notebooks).
22

33
**NOTE:** For the time being NetworkX 2 is incompatible with ELFI.
44

@@ -28,6 +28,8 @@ Other notable included algorithms and methods:
2828
- Bayesian Optimization
2929
- [No-U-Turn-Sampler](http://jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf), a Hamiltonian Monte Carlo MCMC sampler
3030

31+
ELFI also integrates tools for visualization, model comparison, diagnostics and post-processing.
32+
3133
See examples under [notebooks](https://github.com/elfi-dev/notebooks) to get started. Full
3234
documentation can be found at http://elfi.readthedocs.io/. Limited user-support may be
3335
asked from elfi-support.at.hiit.fi, but the

docs/api.rst

+7
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,13 @@ Inference API classes
265265
:members:
266266
:inherited-members:
267267

268+
**Model selection**
269+
270+
.. currentmodule:: .
271+
272+
.. autofunction:: elfi.compare_models
273+
274+
268275
Other
269276
.....
270277

docs/faq.rst

+28
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,31 @@ produces outputs from the interval (1, 3).*
1010
their definitions. There the uniform distribution uses the location/scale definition, so
1111
the first argument defines the starting point of the interval and the second its length.
1212

13+
.. _vectorization:
14+
15+
*Q: What is vectorization in ELFI?*
16+
17+
**A**: Looping is relatively inefficient in Python, and so whenever possible, you should *vectorize*
18+
your operations_. This means that repetitive computations are performed on a batch of data using
19+
precompiled libraries (typically NumPy_), which effectively runs the loops in faster, compiled C-code.
20+
ELFI supports vectorized operations, and due to the potentially huge saving in CPU-time it is
21+
recommended to vectorize all user-code whenever possible.
22+
23+
.. _operations: good-to-know.html#operations
24+
.. _NumPy: http://www.numpy.org/
25+
26+
For example, imagine you have a simulator that depends on a scalar parameter and produces a vector of 5
27+
values. When this is used in ELFI with ``batch_size`` set to 1000, ELFI draws 1000 values from the
28+
parameter's prior distribution and gives this *vector* to the simulator. Ideally, the simulator should
29+
efficiently process all 1000 parameter cases in one go and output an array of shape (1000, 5). When using
30+
vectorized operations in ELFI, the length (i.e. the first dimension) of all output arrays should equal
31+
``batch_size``. Note that because of this the evaluation of summary statistics, distances etc. should
32+
bypass the first dimension (e.g. with NumPy functions using ``axis=1`` in this case).
33+
34+
See ``elfi.examples`` for tips on how to vectorize simulators and work with ELFI. In case you are
35+
unable to vectorize your simulator, you can use `elfi.tools.vectorize`_ to mimic
36+
vectorized behaviour, though without the performance benefits. Finally, for simplicity vectorization
37+
is not assumed (``batch_size=1`` by default).
38+
39+
.. _`elfi.tools.vectorize`: api.html#elfi.tools.vectorize
40+

docs/index.rst

+2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ ELFI also has the following non LFI methods:
3434

3535
.. _No-U-Turn-Sampler: http://jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf
3636

37+
Additionally, ELFI integrates tools for visualization, model comparison, diagnostics and post-processing.
38+
3739

3840
.. toctree::
3941
:maxdepth: 1

docs/usage/tutorial.rst

+33-33
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,9 @@ settings.
1818
1919
import numpy as np
2020
import scipy.stats
21-
import matplotlib
2221
import matplotlib.pyplot as plt
2322
import logging
24-
logging.basicConfig(level=logging.INFO)
23+
logging.basicConfig(level=logging.INFO) # sometimes this is required to enable logging inside Jupyter
2524
2625
%matplotlib inline
2726
%precision 2
@@ -251,7 +250,7 @@ a DAG.
251250

252251

253252

254-
.. note:: You will need the Graphviz_ software as well as the graphviz `Python package`_ (https://pypi.python.org/pypi/graphviz) for drawing this. The software is already installed in many unix-like OS.
253+
.. note:: You will need the Graphviz_ software as well as the graphviz `Python package`_ (https://pypi.python.org/pypi/graphviz) for drawing this.
255254

256255
.. _Graphviz: http://www.graphviz.org
257256
.. _`Python package`: https://pypi.python.org/pypi/graphviz
@@ -396,8 +395,8 @@ time is spent in drawing.
396395

397396
.. parsed-literal::
398397
399-
CPU times: user 2.28 s, sys: 165 ms, total: 2.45 s
400-
Wall time: 2.45 s
398+
CPU times: user 1.6 s, sys: 166 ms, total: 1.77 s
399+
Wall time: 1.76 s
401400
402401
403402
The ``sample`` method returns a ``Sample`` object, which contains
@@ -452,8 +451,8 @@ as long as it takes to generate the requested number of samples.
452451
453452
.. parsed-literal::
454453
455-
CPU times: user 222 ms, sys: 40.3 ms, total: 263 ms
456-
Wall time: 261 ms
454+
CPU times: user 198 ms, sys: 35.5 ms, total: 233 ms
455+
Wall time: 231 ms
457456
Method: Rejection
458457
Number of samples: 1000
459458
Number of simulations: 40000
@@ -497,9 +496,9 @@ been reached or a maximum of one second of time has been used.
497496
498497
Method: Rejection
499498
Number of samples: 1000
500-
Number of simulations: 190000
501-
Threshold: 0.0855
502-
Sample means: t1: 0.561, t2: 0.218
499+
Number of simulations: 180000
500+
Threshold: 0.088
501+
Sample means: t1: 0.561, t2: 0.221
503502
504503
505504
@@ -547,8 +546,8 @@ in our model:
547546
548547
.. parsed-literal::
549548
550-
CPU times: user 5.26 s, sys: 37.1 ms, total: 5.3 s
551-
Wall time: 5.3 s
549+
CPU times: user 5.01 s, sys: 60.9 ms, total: 5.07 s
550+
Wall time: 5.09 s
552551
553552
554553
@@ -558,8 +557,8 @@ in our model:
558557
Method: Rejection
559558
Number of samples: 1000
560559
Number of simulations: 1000000
561-
Threshold: 0.036
562-
Sample means: t1: 0.561, t2: 0.227
560+
Threshold: 0.0363
561+
Sample means: t1: 0.554, t2: 0.216
563562
564563
565564
@@ -580,8 +579,8 @@ anything. Let's do that.
580579
581580
.. parsed-literal::
582581
583-
CPU times: user 636 ms, sys: 1.35 ms, total: 638 ms
584-
Wall time: 638 ms
582+
CPU times: user 423 ms, sys: 3.35 ms, total: 426 ms
583+
Wall time: 429 ms
585584
586585
587586
@@ -591,8 +590,8 @@ anything. Let's do that.
591590
Method: Rejection
592591
Number of samples: 1000
593592
Number of simulations: 1000000
594-
Threshold: 0.0452
595-
Sample means: t1: 0.56, t2: 0.228
593+
Threshold: 0.0457
594+
Sample means: t1: 0.55, t2: 0.216
596595
597596
598597
@@ -610,8 +609,8 @@ simulations and only have to simulate the new ones:
610609
611610
.. parsed-literal::
612611
613-
CPU times: user 1.72 s, sys: 10.6 ms, total: 1.73 s
614-
Wall time: 1.73 s
612+
CPU times: user 1.44 s, sys: 17.9 ms, total: 1.46 s
613+
Wall time: 1.47 s
615614
616615
617616
@@ -621,8 +620,8 @@ simulations and only have to simulate the new ones:
621620
Method: Rejection
622621
Number of samples: 1000
623622
Number of simulations: 1200000
624-
Threshold: 0.0417
625-
Sample means: t1: 0.561, t2: 0.225
623+
Threshold: 0.0415
624+
Sample means: t1: 0.55, t2: 0.215
626625
627626
628627
@@ -640,8 +639,8 @@ standard numpy .npy files:
640639
641640
.. parsed-literal::
642641
643-
CPU times: user 25.8 ms, sys: 3.27 ms, total: 29 ms
644-
Wall time: 28.5 ms
642+
CPU times: user 28.7 ms, sys: 4.5 ms, total: 33.2 ms
643+
Wall time: 33.4 ms
645644
646645
647646
This stores the simulated data in binary ``npy`` format under
@@ -658,7 +657,7 @@ This stores the simulated data in binary ``npy`` format under
658657
659658
.. parsed-literal::
660659
661-
Files in pools/arraypool_3521077242 are ['d.npy', 't1.npy', 't2.npy', 'Y.npy']
660+
Files in pools/arraypool_3375867934 are ['d.npy', 't1.npy', 't2.npy', 'Y.npy']
662661
663662
664663
Now lets load all the parameters ``t1`` that were generated with numpy:
@@ -672,7 +671,7 @@ Now lets load all the parameters ``t1`` that were generated with numpy:
672671
673672
.. parsed-literal::
674673
675-
array([ 0.79, -0.01, -1.47, ..., 0.98, 0.18, 0.5 ])
674+
array([ 0.36, 0.47, -1.66, ..., 0.09, 0.45, 0.2 ])
676675
677676
678677
@@ -687,7 +686,7 @@ We can also close (or save) the whole pool if we wish to continue later:
687686
688687
.. parsed-literal::
689688
690-
arraypool_3521077242
689+
arraypool_3375867934
691690
692691
693692
And open it up later to continue where we were left. We can open it
@@ -718,12 +717,12 @@ You can delete the files with:
718717
os.listdir(arraypool.path)
719718
720719
except FileNotFoundError:
721-
print("The directry is removed")
720+
print("The directory is removed")
722721
723722
724723
.. parsed-literal::
725724
726-
The directry is removed
725+
The directory is removed
727726
728727
729728
Visualizing the results
@@ -820,8 +819,9 @@ sampler:
820819
smc = elfi.SMC(d, batch_size=10000, seed=seed)
821820
822821
For sampling, one has to define the number of output samples, the number
823-
of populations and a *schedule* i.e. a list of quantiles to use for each
824-
population. In essence, a population is just refined rejection sampling.
822+
of populations and a *schedule* i.e. a list of thresholds to use for
823+
each population. In essence, a population is just refined rejection
824+
sampling.
825825

826826
.. code:: ipython3
827827
@@ -839,8 +839,8 @@ population. In essence, a population is just refined rejection sampling.
839839
840840
.. parsed-literal::
841841
842-
CPU times: user 1.72 s, sys: 154 ms, total: 1.87 s
843-
Wall time: 1.56 s
842+
CPU times: user 1.6 s, sys: 156 ms, total: 1.75 s
843+
Wall time: 1.38 s
844844
845845
846846
We can have summaries and plots of the results just like above:

elfi/__init__.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
import elfi.model.tools as tools
1313
from elfi.client import get_client, set_client
1414
from elfi.methods.diagnostics import TwoStageSelection
15+
from elfi.methods.model_selection import *
1516
from elfi.methods.parameter_inference import *
1617
from elfi.methods.post_processing import adjust_posterior
1718
from elfi.model.elfi_model import *
@@ -24,4 +25,4 @@
2425
__email__ = 'elfi-support@hiit.fi'
2526

2627
# make sure __version_ is on the last non-empty line (read by setup.py)
27-
__version__ = '0.7'
28+
__version__ = '0.7.1'

elfi/methods/model_selection.py

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
"""This module contains methods for model comparison and selection."""
2+
3+
import numpy as np
4+
5+
6+
def compare_models(sample_objs, model_priors=None):
7+
"""Find posterior probabilities for different models.
8+
9+
The algorithm requires elfi.Sample objects from prerun inference methods. For example the
10+
output from elfi.Rejection.sample is valid. The portion of samples for each model in the top
11+
discrepancies are adjusted by each models acceptance ratio and prior probability.
12+
13+
The discrepancies (including summary statistics) must be comparable so that it is
14+
meaningful to sort them!
15+
16+
Parameters
17+
----------
18+
sample_objs : list of elfi.Sample
19+
Resulting Sample objects from prerun inference models. The objects must include
20+
a valid `discrepancies` attribute.
21+
model_priors : array_like, optional
22+
Prior probability of each model. Defaults to 1 / n_models.
23+
24+
Returns
25+
-------
26+
np.array
27+
Posterior probabilities for the considered models.
28+
29+
"""
30+
n_models = len(sample_objs)
31+
n_min = min([s.n_samples for s in sample_objs])
32+
33+
# concatenate discrepancy vectors
34+
try:
35+
discrepancies = np.concatenate([s.discrepancies for s in sample_objs])
36+
except ValueError:
37+
raise ValueError("All Sample objects must include valid discrepancies.")
38+
39+
# sort and take the smallest n_min
40+
inds = np.argsort(discrepancies)[:n_min]
41+
42+
# calculate the portions of accepted samples for each model in the top discrepancies
43+
p_models = np.empty(n_models)
44+
up_bound = 0
45+
for i in range(n_models):
46+
low_bound = up_bound
47+
up_bound += sample_objs[i].n_samples
48+
p_models[i] = np.logical_and(inds >= low_bound, inds < up_bound).sum()
49+
50+
# adjust by the number of simulations run
51+
p_models[i] /= sample_objs[i].n_sim
52+
53+
# adjust by the prior model probability
54+
if model_priors is not None:
55+
p_models[i] *= model_priors[i]
56+
57+
p_models = p_models / p_models.sum()
58+
59+
return p_models

0 commit comments

Comments
 (0)