Basic usage

pySELFI requires 3 basic ingredients:

  • a blackbox simulator, which is used as the data model. It is represented as a blackbox object. See this page for instructions on how to define a new blackbox.

  • a statistical model, which includes in particular the mean and covariance matrix of the (Gaussian) prior. A statistical model is characterized by a prior object, and (if necessary) an instance of a child class of the pyselfi.selfi class. See this page for instructions on how to define a new model.

  • the core infrastructure, which computes the effective likelihood and the effective posterior. It should not be necessary to modify this part of the code, which is located in the root of the directory src/pyselfi/.

The basic usage then consists of 5 steps:

  1. Setting up the prior, blackbox, and observations

  2. Computing the prior

  3. Running the necessary simulations

  4. Computing the effective likelihood

  5. Computing the effective posterior

Setting up the prior, blackbox, and observations

The first step is to setup

  • a prior object (see this page for how to use the power spectrum prior defined in Leclercq et al. 2019, this page for how to use the Lotka-Volterra prior defined in Leclercq 2022, or this page for how to setup a new prior); if the model has \(S\) parameters, the prior should have dimension \(S\) too),

  • a blackbox object (see this page for how to use Simbelmynë as blackbox, this page for how to use lotkavolterra_simulator as blackbox, or this page for how to setup a new blackbox),

  • and the observed summaries vector phi_obs (a vector of dimension \(P\) if the blackbox returns vectors of dimension \(P\)).

After that, define the selfi object using:

from pyselfi.selfi import selfi
selfi=selfi(fname,pool_prefix,pool_suffix,prior,blackbox,theta_0,Ne,Ns,Delta_theta,phi_obs)

or an instance of a child class, such as:

from pyselfi.power_spectrum.selfi import power_spectrum_selfi
selfi=power_spectrum_selfi(fname,pool_prefix,pool_suffix,prior,blackbox,
                           theta_0,Ne,Ns,Delta_theta,phi_obs)

fname corresponds to the name of an hdf5 file where results will be saved (and from which they can be loaded if they have been precomputed). pool_prefix and pool_suffix are the prefix and suffix of simulation pool filenames, so that a pool is identified by pool_prefix{d}poolsufffix where \(d\) runs from \(0\) to \(S\) (\(0\) for the expansion point and \(1\) to \(S\) for all possible directions in parameter space). theta_0 is the expansion point (a vector of size \(S\)). Ne is the number of required simulations at the expansion point (\(N_0\) in the notations of Leclercq et al. 2019), Ns the number of required simulations at each of the expansion points (\(N_s\) in the notations of Leclercq et al. 2019), and Delta_theta the step size for finite differencing (\(h\) in the notations of Leclercq et al. 2019).

Child objects of the base selfi class may require additional attributes. For more details, see the API documentation.

Computing the prior

The second step is to compute and save the prior (mean, covariance matrix \(\textbf{S}\) and inverse of the covariance matrix \(\textbf{S}^{-1}\)). This is achieved using:

selfi.compute_prior()
selfi.save_prior()

If the prior has already been computed and saved to the pySELFI output file, it can instead simply be loaded using:

selfi.load_prior()

The prior mean and covariance matrix can then be accessed using respectively selfi.prior.mean and selfi.prior.covariance.

Running the necessary simulations

The third step is to run or load the necessary simulations, it is typically the expensive step. This is achieved using:

selfi.run_simulations()

if one wants to run or load all the simulations at once. To take advantage of the parallelization potential, it is also possible to use:

selfi.run_simulations(d)

where \(d\) is an integer from \(0\) to \(S\) or a list of integers from \(0\) to \(S\).

Since pySELFI uses the concept of simulation pool, it will first try to load as many simulations as possible from the specified pool files, then run any additional simulation required. Therefore, if all simulations have been precomputed, this step only consists in loading the results.

Computing the effective likelihood

The fourth step is to compute and save the effective likelihood (covariance matrix of the summaries at the expansion point \(\textbf{C}_0\) and its inverse \(\textbf{C}_0^{-1}\), and gradient of the average blackbox \(\boldsymbol{\nabla}\textbf{f}_0\)). This is achieved using:

selfi.compute_likelihood()
selfi.save_likelihood()

This step assumes that the necessary simulations are accessible.

If the effective likelihood has already been computed and saved to the pySELFI output file, it can instead simply be loaded using:

selfi.load_likelihood()

This covariance matrix at the expansion point and the gradient of the blackbox are then accessed using respectively selfi.likelihood.C_0 and selfi.likelihood.grad_f.

Computing the effective posterior

The fifth step is to compute and save the effective posterior (mean \(\boldsymbol{\gamma}\) and covariance matrix \(\boldsymbol{\Gamma}\)). This is achieved using:

selfi.compute_posterior()
selfi.save_posterior()

This step assumes that the prior and effective likelihood have been computed.

If the effective likelihood has already been computed and saved to the pySELFI output file, it can instead simply be loaded using:

selfi.load_posterior()

The effective posterior mean and covariance matrix can then be accessed using respectively selfi.posterior.mean and selfi.posterior.covariance: these are the end products of pySELFI.