fgivenx package
The main driving routines for this package are:
Example import and usage:
>>> import numpy
>>> from fgivenx import plot_contours, plot_lines, ... plot_dkl, samples_from_getdist_chains
>>>
>>> file_root = '/my/getdist/file/root'
>>> params = ['m', 'c']
>>> samples = samples_from_getdist_chains(params, file_root)
>>> x = numpy.linspace(-1, 1, 100)
>>>
>>> def f(x, theta):
>>> m, c = params
>>> y = m * x + c
>>> return y
>>>
>>> plot_contours(f, x, samples)
Submodules
fgivenx.dkl module
- fgivenx.dkl.DKL(arrays)[source]
Compute the Kullback-Leibler divergence from one distribution Q to another P, where Q and P are represented by a set of samples.
- Parameters:
- arrays: tuple(1D numpy.array,1D numpy.array)
samples defining distributions P & Q respectively
- Returns:
- float:
Kullback Leibler divergence.
- fgivenx.dkl.compute_dkl(fsamps, prior_fsamps, **kwargs)[source]
Compute the Kullback Leibler divergence for function samples for posterior and prior pre-calculated at a range of x values.
- Parameters:
- fsamps: 2D numpy.array
Posterior function samples, as computed by
fgivenx.compute_samples()
- prior_fsamps: 2D numpy.array
Prior function samples, as computed by
fgivenx.compute_samples()
- parallel, tqdm_kwargs: optional
see docstring for
fgivenx.parallel.parallel_apply()
.- cache: str, optional
File root for saving previous calculations for re-use.
- Returns:
- 1D numpy.array:
Kullback-Leibler divergences at each value of x. shape=(len(fsamps))
fgivenx.drivers module
This module provides utilities for computing the grid for contours of a function reconstruction plot.
- Required ingredients:
sampled posterior probability distribution \(P(\theta)\)
independent variable \(x\)
dependent variable \(y\)
functional form \(y = f(x;\theta)\) parameterised by \(\theta\)
Assuming that you have obtained samples of \(\theta\) from an MCMC process, we aim to compute the density:
which gives our degree of knowledge for each \(y=f(x;\theta)\) value given an \(x\) value.
In fact, for a more representative plot, we are not actually interested in the value of the probability density above, but in fact require the “iso-probablity posterior mass”
We thus need to compute this function on a rectangular grid of \(x\) and \(y\).
- fgivenx.drivers.compute_dkl(f, x, samples, prior_samples, **kwargs)[source]
Compute the Kullback-Leibler divergence at each value of x for the prior and posterior defined by prior_samples and samples.
- Parameters:
- f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples, prior_samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) from posterior and prior to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights, prior_weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones(samples.shape[0])
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache, prior_cache: str, optional
File roots for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot()
.
- Returns:
- 1D numpy array:
dkl values at each value of x.
- fgivenx.drivers.compute_pmf(f, x, samples, **kwargs)[source]
Compute the probability mass function given x at a range of x values for \(y = f(x|\theta)\)
\(P(y|x) = \int P(y=f(x;\theta)|x,\theta) P(\theta) d\theta\)
\(\mathrm{pmf}(y|x) = \int_{P(y'|x) < P(y|x)} P(y'|x) dy'\)
Additionally, if a list of log-evidences are passed, along with list of functions, samples and optional weights it marginalises over the models according to the evidences.
- Parameters:
- f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones(samples.shape[0])
- ny: int, optional
Resolution of y axis. Default: 100
- y: array-like, optional
Explicit descriptor of y values to evaluate. Default: numpy.linspace(min(f), max(f), ny)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- Returns:
- 1D numpy.array:
y values pmf is computed at shape=(len(y)) or ny
- 2D numpy.array:
pmf values at each x and y shape=(len(x),len(y))
- fgivenx.drivers.compute_samples(f, x, samples, **kwargs)[source]
Apply the function(s) \(f(x;\theta)\) to the arrays defined in x and samples. Has options for weighting, trimming, cacheing & parallelising.
Additionally, if a list of log-evidences are passed, along with list of functions, samples and optional weights it marginalises over the models according to the evidences.
- Parameters:
- f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones(samples.shape[0])
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use. Default: None
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- Returns:
- 2D numpy.array
Evaluate the function f at each x value and each theta. Equivalent to [[f(x_i,theta) for theta in samples] for x_i in x]
- fgivenx.drivers.plot_contours(f, x, samples, ax=None, **kwargs)[source]
Plot the probability mass function given x at a range of \(y\) values for \(y = f(x|\theta)\)
\(P(y|x) = \int P(y=f(x;\theta)|x,\theta) P(\theta) d\theta\)
\(\mathrm{pmf}(y|x) = \int_{P(y'|x) < P(y|x)} P(y'|x) dy'\)
Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the probability mass function for all models marginalised according to the evidences.
- Parameters:
- f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones(samples.shape[0])
- ny: int, optional
Resolution of y axis. Default: 100
- y: array-like, optional
Explicit descriptor of y values to evaluate. Default: numpy.linspace(min(f), max(f), ny)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot()
.
- Returns:
- cbar: color bar
matplotlib.contour.QuadContourSet
Colors to create a global colour bar
- fgivenx.drivers.plot_dkl(f, x, samples, prior_samples, ax=None, **kwargs)[source]
Plot the Kullback-Leibler divergence at each value of \(x\) for the prior and posterior defined by prior_samples and samples.
Let the posterior be:
\(P(y|x) = \int P(y=f(x;\theta)|x,\theta)P(\theta) d\theta\)
and the prior be:
\(Q(y|x) = \int P(y=f(x;\theta)|x,\theta)Q(\theta) d\theta\)
then the Kullback-Leibler divergence at each x is defined by
\(D_\mathrm{KL}(x)=\int P(y|x)\ln\left[\frac{Q(y|x)}{P(y|x)}\right]dy\)
Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the Kullback-Leibler divergence for all models marginalised according to the evidences.
- Parameters:
- f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples, prior_samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) from posterior and prior to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights, prior_weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones(samples.shape[0])
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache, prior_cache: str, optional
File roots for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot()
.
- fgivenx.drivers.plot_lines(f, x, samples, ax=None, **kwargs)[source]
Plot a representative set of functions to sample
Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the probability mass function for all models marginalised according to the evidences.
- Parameters:
- f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones(samples.shape[0])
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot_lines()
.
fgivenx.io module
- class fgivenx.io.Cache(file_root)[source]
Bases:
object
Cacheing tool for saving recomputation.
- Parameters:
- file_root: str
cached values are saved in file_root.pkl
- check(*args)[source]
Check that the arguments haven’t changed since the last call.
- Parameters:
- *args:
All but the last argument are inputs to the cached function. The last is the actual value of the function.
- Returns:
- If arguments unchanged:
return the cached answer
- else:
indicate recomputation required by throwing a
CacheException
.
- exception fgivenx.io.CacheChanged(file_root)[source]
Bases:
CacheException
Exception to indicate the cache has changed.
- exception fgivenx.io.CacheException[source]
Bases:
Exception
Base exception to indicate cache errors
- exception fgivenx.io.CacheMissing(file_root)[source]
Bases:
CacheException
Exception to indicate the cache does not exist.
- exception fgivenx.io.CacheOK(file_root)[source]
Bases:
CacheException
Exception to indicate the cache can be used.
fgivenx.mass module
Utilities for computing the probability mass function.
- fgivenx.mass.PMF(samples, y)[source]
Compute the probability mass function.
The set of samples defines a probability density P(y), which is computed using a kernel density estimator.
From \(P(y)\) we define:
\(\mathrm{pmf}(p) = \int_{P(y)<p} P(y) dy\)
This is the cumulative distribution function expressed as a function of the probability
We aim to compute \(M(y)\), which indicates the amount of probability contained outside the iso-probability contour passing through \(y\):
^ P(y) ... | | . . | | . p|- - - - - - - - - - .+- - - - . - - - - - - - - - - - | .#| #. | .##| ##. | .##| ##. | .###| ###. M(p) | .###| ###. is the | .###| ###. shaded area | .####| ####. | .####| ####. | ..#####| #####.. | ....#######| #######.... | .###########| ###########. +---------------------+-------------------------------> y t ^ M(p) ^ M(y) | | 1| +++ 1| + | + | + + | ++++++++ | + + | ++ | ++ ++ | ++ | ++ ++ |+++ |+++ +++ +---------------------> p +---------------------> y 0
- Parameters:
- samples: array-like
Array of samples from a probability density P(y).
- y: array-like (optional)
Array to evaluate the PDF at
- Returns:
- 1D numpy.array:
PMF evaluated at each y value
- fgivenx.mass.compute_pmf(fsamps, y, **kwargs)[source]
Compute the pmf defined by fsamps at each x for each y.
- Parameters:
- fsamps: 2D array-like
array of function samples, as returned by
fgivenx.compute_samples()
- y: 1D array-like
y values to evaluate the PMF at
- parallel, tqdm_kwargs: optional
see docstring for
fgivenx.parallel.parallel_apply()
.
- Returns:
- 2D numpy.array
probability mass function at each x for each y shape=(len(fsamps),len(y)
fgivenx.parallel module
- fgivenx.parallel.parallel_apply(f, array, **kwargs)[source]
Apply a function to an array with openmp parallelisation.
Equivalent to [f(x) for x in array], but parallelised if required.
- Parameters:
- f: function
Univariate function to apply to each element of array
- array: array-like
Array to apply f to
- parallel: int or bool, optional
int > 0: number of processes to parallelise over
int < 0 or bool=True: use OMP_NUM_THREADS to choose parallelisation
bool=False or int=0: do not parallelise
- tqdm_kwargs: dict, optional
additional kwargs for tqdm progress bars.
- precurry: tuple, optional
immutable arguments to pass to f before x, i.e. [f(precurry,x) for x in array]
- postcurry: tuple, optional
immutable arguments to pass to f after x i.e. [f(x,postcurry) for x in array]
- Returns:
- list:
[f(precurry,x,postcurry) for x in array] parallelised according to parallel
fgivenx.plot module
- fgivenx.plot.plot(x, y, z, ax=None, **kwargs)[source]
Plot iso-probability mass function, converted to sigmas.
- Parameters:
- x, y, znumpy arrays
Same as arguments to
matplotlib.pyplot.contour()
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- colors: color scheme, optional
matplotlib.colors.LinearSegmentedColormap
Color scheme to plot with. Recommend plotting in reverse (Default:matplotlib.pyplot.cm.Reds_r
)- alpha: float, optional
Transparency of filled contours. Given as alpha blending value between 0 (transparent) and 1 (opague).
- smooth: float, optional
Percentage by which to smooth the contours. (Default: no smoothing)
- contour_line_levels: List[float], optional
Contour lines to be plotted. (Default: [1,2])
- linewidths: float, optional
Thickness of contour lines. (Default: 0.3)
- contour_color_levels: List[float], optional
Contour color levels. (Default: numpy.arange(0, contour_line_levels[-1] + fineness / 2, fineness))
- fineness: float, optional
Spacing of contour color levels. (Default: 0.5)
- lines: bool, optional
(Default: True)
- rasterize_contours: bool, optional
Rasterize the contours while keeping the lines, text etc in vector format. Useful for reducing file size bloat and making printing easier when you have dense contours. (Default: False)
- Returns:
- cbar: color bar
matplotlib.contour.QuadContourSet
Colors to create a global colour bar
- fgivenx.plot.plot_lines(x, fsamps, ax=None, downsample=100, **kwargs)[source]
Plot function samples as a set of line plots.
- Parameters:
- x: 1D array-like
x values to plot
- fsamps: 2D array-like
set of functions to plot at each x. As returned by
fgivenx.compute_samples()
- ax: axes object
matplotlib.pyplot.ax
to plot on.- downsample: int, optional
Reduce the number of samples to a viewable quantity. (Default 100)
- any other keywords are passed to :meth:`matplotlib.pyplot.ax.plot`
fgivenx.samples module
- fgivenx.samples.compute_samples(f, x, samples, **kwargs)[source]
Apply f(x,theta) to x array and theta in samples.
- Parameters:
- f: function
list of functions \(f(x;\theta)\) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
list of theta samples to evaluate \(f(x;\theta)\) at. shape = (nfunc, nsamples, npars)
- parallel, tqdm_kwargs: optional
see docstring for
fgivenx.parallel.parallel_apply()
- cache: str, optional
File root for saving previous calculations for re-use default None
- Returns:
- 2D numpy.array:
samples at each x. shape=(len(x),len(samples),)
- fgivenx.samples.samples_from_getdist_chains(params, file_root, latex=False, **kwargs)[source]
Extract samples and weights from getdist chains.
- Parameters:
- params: list(str)
Names of parameters to be supplied to second argument of f(x|theta).
- file_root: str, optional
Root name for getdist chains files. This variable automatically defines: - chains_file = file_root.txt - paramnames_file = file_root.paramnames but can be overidden by chains_file or paramnames_file.
- latex: bool, optional
Also return an array of latex strings for those paramnames.
- Any additional keyword arguments are forwarded onto getdist, e.g:
- samples_from_getdist_chains(params, file_root,
settings={‘ignore_rows’:0.5})
- Returns:
- samples: numpy.array
2D Array of samples. shape=(len(samples), len(params))
- weights: numpy.array
Array of weights. shape = (len(params),)
- latex: list(str), optional
list of latex strings for each parameter (if latex is provided as an argument)