fgivenx package¶

Module contents¶

The main driving routines for this package are:

plot_contours
plot_lines
plot_dkl
samples_from_getdist_chains

Example import and usage:

>>> import numpy
>>> from fgivenx import plot_contours, plot_lines, ...                     plot_dkl, samples_from_getdist_chains
>>>
>>> file_root = '/my/getdist/file/root'
>>> params = ['m', 'c']
>>> samples = samples_from_getdist_chains(params, file_root)
>>> x = numpy.linspace(-1, 1, 100)
>>>
>>> def f(x, theta):
>>>     m, c = params
>>>     y = m * x + c
>>>     return y
>>>
>>> plot_contours(f, x, samples)

Submodules¶

fgivenx.drivers module¶

This module provides utilities for computing the grid for contours of a function reconstruction plot.

Required ingredients:

sampled posterior probability distribution \(P(\theta)\)
independent variable \(x\)
dependent variable \(y\)
functional form \(y = f(x;\theta)\) parameterised by \(\theta\)

Assuming that you have obtained samples of \(\theta\) from an MCMC process, we aim to compute the density:

\[\begin{split}P(y|x) &= \int P(y=f(x;\theta)|x,\theta) P(\theta) d\theta \\ &= \int \delta(y-f(x;\theta)) P(\theta) d\theta\end{split}\]

which gives our degree of knowledge for each \(y=f(x;\theta)\) value given an \(x\) value.

In fact, for a more representative plot, we are not actually interested in the value of the probability density above, but in fact require the “iso-probablity posterior mass”

\[\mathrm{pmf}(y|x) = \int_{P(y'|x) < P(y|x)} P(y'|x) dy'\]

We thus need to compute this function on a rectangular grid of \(x\) and \(y\).

fgivenx.drivers.compute_dkl(f, x, samples, prior_samples, **kwargs)[source]¶

Compute the Kullback-Leibler divergence at each value of x for the prior and posterior defined by prior_samples and samples.

Parameters:

f: function: function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
x: 1D array-like: x values to evaluate \(f(x;\theta)\) at.
samples, prior_samples: 2D array-like: \(\theta\) samples (or list of \(\theta\) samples) from posterior and prior to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
logZ: 1D array-like, optional: log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
weights, prior_weights: 1D array-like, optional: sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
ntrim: int, optional: Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
cache, prior_cache: str, optional: File roots for saving previous calculations for re-use
parallel, tqdm_args:: see docstring for fgivenx.parallel.parallel_apply()
kwargs: further keyword arguments: Any further keyword arguments are plotting keywords that are passed to fgivenx.plot.plot().

Returns:

1D numpy array:: dkl values at each value of x.

fgivenx.drivers.compute_pmf(f, x, samples, **kwargs)[source]¶

Compute the probability mass function given x at a range of x values for \(y = f(x|\theta)\)

\(P(y|x) = \int P(y=f(x;\theta)|x,\theta) P(\theta) d\theta\)

\(\mathrm{pmf}(y|x) = \int_{P(y'|x) < P(y|x)} P(y'|x) dy'\)

Additionally, if a list of log-evidences are passed, along with list of functions, samples and optional weights it marginalises over the models according to the evidences.

Parameters:

f: function: function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
x: 1D array-like: x values to evaluate \(f(x;\theta)\) at.
samples: 2D array-like: \(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
logZ: 1D array-like, optional: log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
weights: 1D array-like, optional: sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
ny: int, optional: Resolution of y axis. Default: 100
y: array-like, optional: Explicit descriptor of y values to evaluate. Default: numpy.linspace(min(f), max(f), ny)
ntrim: int, optional: Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
cache: str, optional: File root for saving previous calculations for re-use
parallel, tqdm_args:: see docstring for fgivenx.parallel.parallel_apply()

Returns:

1D numpy.array:: y values pmf is computed at shape=(len(y)) or ny
2D numpy.array:: pmf values at each x and y shape=(len(x),len(y))

fgivenx.drivers.compute_samples(f, x, samples, **kwargs)[source]¶

Apply the function(s) \(f(x;\theta)\) to the arrays defined in x and samples. Has options for weighting, trimming, cacheing & parallelising.

Additionally, if a list of log-evidences are passed, along with list of functions, samples and optional weights it marginalises over the models according to the evidences.

Parameters:

f: function: function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
x: 1D array-like: x values to evaluate \(f(x;\theta)\) at.
samples: 2D array-like: \(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
logZ: 1D array-like, optional: log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
weights: 1D array-like, optional: sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
ntrim: int, optional: Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
cache: str, optional: File root for saving previous calculations for re-use. Default: None
parallel, tqdm_args:: see docstring for fgivenx.parallel.parallel_apply()

Returns:

2D numpy.array: Evaluate the function f at each x value and each theta. Equivalent to [[f(x_i,theta) for theta in samples] for x_i in x]

fgivenx.drivers.plot_contours(f, x, samples, ax=None, **kwargs)[source]¶

Plot the probability mass function given x at a range of \(y\) values for \(y = f(x|\theta)\)

\(P(y|x) = \int P(y=f(x;\theta)|x,\theta) P(\theta) d\theta\)

\(\mathrm{pmf}(y|x) = \int_{P(y'|x) < P(y|x)} P(y'|x) dy'\)

Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the probability mass function for all models marginalised according to the evidences.

Parameters:

f: function: function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
x: 1D array-like: x values to evaluate \(f(x;\theta)\) at.
samples: 2D array-like: \(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
ax: axes object, optional: matplotlib.axes._subplots.AxesSubplot to plot the contours onto. If unsupplied, then matplotlib.pyplot.gca() is used to get the last axis used, or create a new one.
logZ: 1D array-like, optional: log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
weights: 1D array-like, optional: sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
ny: int, optional: Resolution of y axis. Default: 100
y: array-like, optional: Explicit descriptor of y values to evaluate. Default: numpy.linspace(min(f), max(f), ny)
ntrim: int, optional: Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
cache: str, optional: File root for saving previous calculations for re-use
parallel, tqdm_args:: see docstring for fgivenx.parallel.parallel_apply()
kwargs: further keyword arguments: Any further keyword arguments are plotting keywords that are passed to fgivenx.plot.plot().

Returns:

cbar: color bar: matplotlib.contour.QuadContourSet Colors to create a global colour bar

fgivenx.drivers.plot_dkl(f, x, samples, prior_samples, ax=None, **kwargs)[source]¶

Plot the Kullback-Leibler divergence at each value of \(x\) for the prior and posterior defined by prior_samples and samples.

Let the posterior be:

\(P(y|x) = \int P(y=f(x;\theta)|x,\theta)P(\theta) d\theta\)

and the prior be:

\(Q(y|x) = \int P(y=f(x;\theta)|x,\theta)Q(\theta) d\theta\)

then the Kullback-Leibler divergence at each x is defined by

\(D_\mathrm{KL}(x)=\int P(y|x)\ln\left[\frac{Q(y|x)}{P(y|x)}\right]dy\)

Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the Kullback-Leibler divergence for all models marginalised according to the evidences.

Parameters:

f: function: function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
x: 1D array-like: x values to evaluate \(f(x;\theta)\) at.
samples, prior_samples: 2D array-like: \(\theta\) samples (or list of \(\theta\) samples) from posterior and prior to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
ax: axes object, optional: matplotlib.axes._subplots.AxesSubplot to plot the contours onto. If unsupplied, then matplotlib.pyplot.gca() is used to get the last axis used, or create a new one.
logZ: 1D array-like, optional: log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
weights, prior_weights: 1D array-like, optional: sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
ntrim: int, optional: Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
cache, prior_cache: str, optional: File roots for saving previous calculations for re-use
parallel, tqdm_args:: see docstring for fgivenx.parallel.parallel_apply()
kwargs: further keyword arguments: Any further keyword arguments are plotting keywords that are passed to fgivenx.plot.plot().

fgivenx.drivers.plot_lines(f, x, samples, ax=None, **kwargs)[source]¶

Plot a representative set of functions to sample

Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the probability mass function for all models marginalised according to the evidences.

Parameters:

f: function: function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
x: 1D array-like: x values to evaluate \(f(x;\theta)\) at.
samples: 2D array-like: \(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
ax: axes object, optional: matplotlib.axes._subplots.AxesSubplot to plot the contours onto. If unsupplied, then matplotlib.pyplot.gca() is used to get the last axis used, or create a new one.
logZ: 1D array-like, optional: log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
weights: 1D array-like, optional: sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
ntrim: int, optional: Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
cache: str, optional: File root for saving previous calculations for re-use
parallel, tqdm_args:: see docstring for fgivenx.parallel.parallel_apply()
kwargs: further keyword arguments: Any further keyword arguments are plotting keywords that are passed to fgivenx.plot.plot_lines().

fgivenx.dkl module¶

fgivenx.dkl.DKL(arrays)[source]¶

Compute the Kullback-Leibler divergence from one distribution Q to another P, where Q and P are represented by a set of samples.

Parameters:	arrays: tuple(1D numpy.array,1D numpy.array) samples defining distributions P & Q respectively
Returns:	float: Kullback Leibler divergence.

fgivenx.dkl.compute_dkl(fsamps, prior_fsamps, **kwargs)[source]¶

Compute the Kullback Leibler divergence for function samples for posterior and prior pre-calculated at a range of x values.

Parameters:	fsamps: 2D numpy.array Posterior function samples, as computed by `fgivenx.compute_samples()` prior_fsamps: 2D numpy.array Prior function samples, as computed by `fgivenx.compute_samples()` parallel, tqdm_kwargs: optional see docstring for `fgivenx.parallel.parallel_apply()`. cache: str, optional File root for saving previous calculations for re-use.
Returns:	1D numpy.array: Kullback-Leibler divergences at each value of x. shape=(len(fsamps))

fgivenx.io module¶

class fgivenx.io.Cache(file_root)[source]¶

Bases: object

Cacheing tool for saving recomputation.

Parameters:	file_root: str cached values are saved in file_root.pkl

Methods

`check`(self, \*args)	Check that the arguments haven’t changed since the last call.
`load`(self)	Load cache from file using pickle.
`save`(self, \*args)	Save cache to file using pickle.

check(self, *args)[source]¶

Check that the arguments haven’t changed since the last call.

Parameters:	*args: All but the last argument are inputs to the cached function. The last is the actual value of the function.
Returns:	If arguments unchanged: return the cached answer else: indicate recomputation required by throwing a `CacheException`.

load(self)[source]¶: Load cache from file using pickle.

save(self, *args)[source]¶

Save cache to file using pickle.

Parameters:	*args: All but the last argument are inputs to the cached function. The last is the actual value of the function.

exception fgivenx.io.CacheChanged(file_root)[source]¶

Bases: fgivenx.io.CacheException

Exception to indicate the cache has changed.

exception fgivenx.io.CacheException[source]¶

Bases: exceptions.Exception

Base exception to indicate cache errors

calling_function(self)[source]¶: Get the name of the function calling this cache.

exception fgivenx.io.CacheMissing(file_root)[source]¶

Bases: fgivenx.io.CacheException

Exception to indicate the cache does not exist.

exception fgivenx.io.CacheOK(file_root)[source]¶

Bases: fgivenx.io.CacheException

Exception to indicate the cache can be used.

fgivenx.mass module¶

Utilities for computing the probability mass function.

fgivenx.mass.PMF(samples, y)[source]¶

Compute the probability mass function.

The set of samples defines a probability density P(y), which is computed using a kernel density estimator.

From \(P(y)\) we define:

\(\mathrm{pmf}(p) = \int_{P(y)<p} P(y) dy\)

This is the cumulative distribution function expressed as a function of the probability

We aim to compute \(M(y)\), which indicates the amount of probability contained outside the iso-probability contour passing through \(y\):

 ^ P(y)                   ...
 |                     | .   .
 |                     |       .
p|- - - - - - - - - - .+- - - - . - - - - - - - - - - -
 |                   .#|        #.
 |                  .##|        ##.
 |                  .##|        ##.
 |                 .###|        ###.     M(p)
 |                 .###|        ###.     is the
 |                 .###|        ###.     shaded area
 |                .####|        ####.
 |                .####|        ####.
 |              ..#####|        #####..
 |          ....#######|        #######....
 |         .###########|        ###########.
 +---------------------+-------------------------------> y
                      t

 ^ M(p)                        ^ M(y)
 |                             |
1|                +++         1|         +
 |               +             |        + +
 |       ++++++++              |       +   +
 |     ++                      |     ++     ++
 |   ++                        |   ++         ++
 |+++                          |+++             +++
 +---------------------> p     +---------------------> y
0

Parameters:	samples: array-like Array of samples from a probability density P(y). y: array-like (optional) Array to evaluate the PDF at
Returns:	1D numpy.array: PMF evaluated at each y value

fgivenx.mass.compute_pmf(fsamps, y, **kwargs)[source]¶

Compute the pmf defined by fsamps at each x for each y.

Parameters:	fsamps: 2D array-like array of function samples, as returned by `fgivenx.compute_samples()` y: 1D array-like y values to evaluate the PMF at parallel, tqdm_kwargs: optional see docstring for `fgivenx.parallel.parallel_apply()`.
Returns:	2D numpy.array probability mass function at each x for each y shape=(len(fsamps),len(y)

fgivenx.parallel module¶

fgivenx.parallel.parallel_apply(f, array, **kwargs)[source]¶

Apply a function to an array with openmp parallelisation.

Equivalent to [f(x) for x in array], but parallelised if required.

Parameters:

f: function

Univariate function to apply to each element of array

array: array-like

Array to apply f to

parallel: int or bool, optional

int > 0: number of processes to parallelise over

int < 0 or bool=True: use OMP_NUM_THREADS to choose parallelisation

bool=False or int=0: do not parallelise

tqdm_kwargs: dict, optional

additional kwargs for tqdm progress bars.

precurry: tuple, optional

immutable arguments to pass to f before x, i.e. [f(precurry,x) for x in array]

postcurry: tuple, optional

immutable arguments to pass to f after x i.e. [f(x,postcurry) for x in array]

Returns:

list:: [f(precurry,x,postcurry) for x in array] parallelised according to parallel

fgivenx.plot module¶

fgivenx.plot.plot(x, y, z, ax=None, **kwargs)[source]¶

Plot iso-probability mass function, converted to sigmas.

Parameters:

x, y, z : numpy arrays: Same as arguments to matplotlib.pyplot.contour()
ax: axes object, optional: matplotlib.axes._subplots.AxesSubplot to plot the contours onto. If unsupplied, then matplotlib.pyplot.gca() is used to get the last axis used, or create a new one.
colors: color scheme, optional: matplotlib.colors.LinearSegmentedColormap Color scheme to plot with. Recommend plotting in reverse (Default: matplotlib.pyplot.cm.Reds_r)
smooth: float, optional: Percentage by which to smooth the contours. (Default: no smoothing)
contour_line_levels: List[float], optional: Contour lines to be plotted. (Default: [1,2])
linewidths: float, optional: Thickness of contour lines. (Default: 0.3)
contour_color_levels: List[float], optional: Contour color levels. (Default: numpy.arange(0, contour_line_levels[-1] + 1, fineness))
fineness: float, optional: Spacing of contour color levels. (Default: 0.1)
lines: bool, optional: (Default: True)
rasterize_contours: bool, optional: Rasterize the contours while keeping the lines, text etc in vector format. Useful for reducing file size bloat and making printing easier when you have dense contours. (Default: False)

Returns:

cbar: color bar: matplotlib.contour.QuadContourSet Colors to create a global colour bar

fgivenx.plot.plot_lines(x, fsamps, ax=None, downsample=100, **kwargs)[source]¶

Plot function samples as a set of line plots.

Parameters:	x: 1D array-like x values to plot fsamps: 2D array-like set of functions to plot at each x. As returned by `fgivenx.compute_samples()` ax: axes object `matplotlib.pyplot.ax` to plot on. downsample: int, optional Reduce the number of samples to a viewable quantity. (Default 100) any other keywords are passed to :meth:`matplotlib.pyplot.ax.plot`

fgivenx.samples module¶

fgivenx.samples.compute_samples(f, x, samples, **kwargs)[source]¶

Apply f(x,theta) to x array and theta in samples.

Parameters:

f: function: list of functions \(f(x;\theta)\) with dependent variable \(x\), parameterised by \(\theta\).
x: 1D array-like: x values to evaluate \(f(x;\theta)\) at.
samples: 2D array-like: list of theta samples to evaluate \(f(x;\theta)\) at. shape = (nfunc, nsamples, npars)
parallel, tqdm_kwargs: optional: see docstring for fgivenx.parallel.parallel_apply()
cache: str, optional: File root for saving previous calculations for re-use default None

Returns:

2D numpy.array:: samples at each x. shape=(len(x),len(samples),)

fgivenx.samples.samples_from_getdist_chains(params, file_root, latex=False, **kwargs)[source]¶

Extract samples and weights from getdist chains.

Parameters:

params: list(str)

Names of parameters to be supplied to second argument of f(x|theta).

file_root: str, optional

Root name for getdist chains files. This variable automatically defines: - chains_file = file_root.txt - paramnames_file = file_root.paramnames but can be overidden by chains_file or paramnames_file.

latex: bool, optional

Also return an array of latex strings for those paramnames.

Any additional keyword arguments are forwarded onto getdist, e.g:

samples_from_getdist_chains(params, file_root,: settings={‘ignore_rows’:0.5})

Returns:

samples: numpy.array: 2D Array of samples. shape=(len(samples), len(params))
weights: numpy.array: Array of weights. shape = (len(params),)
latex: list(str), optional: list of latex strings for each parameter (if latex is provided as an argument)