fgivenx package¶
Module contents¶
The main driving routines for this package are:
Example import and usage:
>>> import numpy
>>> from fgivenx import plot_contours, plot_lines, ... plot_dkl, samples_from_getdist_chains
>>>
>>> file_root = '/my/getdist/file/root'
>>> params = ['m', 'c']
>>> samples = samples_from_getdist_chains(params, file_root)
>>> x = numpy.linspace(-1, 1, 100)
>>>
>>> def f(x, theta):
>>> m, c = params
>>> y = m * x + c
>>> return y
>>>
>>> plot_contours(f, x, samples)
Submodules¶
fgivenx.drivers module¶
This module provides utilities for computing the grid for contours of a function reconstruction plot.
- Required ingredients:
- sampled posterior probability distribution \(P(\theta)\)
- independent variable \(x\)
- dependent variable \(y\)
- functional form \(y = f(x;\theta)\) parameterised by \(\theta\)
Assuming that you have obtained samples of \(\theta\) from an MCMC process, we aim to compute the density:
which gives our degree of knowledge for each \(y=f(x;\theta)\) value given an \(x\) value.
In fact, for a more representative plot, we are not actually interested in the value of the probability density above, but in fact require the “iso-probablity posterior mass”
We thus need to compute this function on a rectangular grid of \(x\) and \(y\).
-
fgivenx.drivers.
compute_dkl
(f, x, samples, prior_samples, **kwargs)[source]¶ Compute the Kullback-Leibler divergence at each value of x for the prior and posterior defined by prior_samples and samples.
Parameters: - f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples, prior_samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) from posterior and prior to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights, prior_weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache, prior_cache: str, optional
File roots for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot()
.
Returns: - 1D numpy array:
dkl values at each value of x.
-
fgivenx.drivers.
compute_pmf
(f, x, samples, **kwargs)[source]¶ Compute the probability mass function given x at a range of x values for \(y = f(x|\theta)\)
\(P(y|x) = \int P(y=f(x;\theta)|x,\theta) P(\theta) d\theta\)
\(\mathrm{pmf}(y|x) = \int_{P(y'|x) < P(y|x)} P(y'|x) dy'\)
Additionally, if a list of log-evidences are passed, along with list of functions, samples and optional weights it marginalises over the models according to the evidences.
Parameters: - f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
- ny: int, optional
Resolution of y axis. Default: 100
- y: array-like, optional
Explicit descriptor of y values to evaluate. Default: numpy.linspace(min(f), max(f), ny)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
Returns: - 1D numpy.array:
y values pmf is computed at shape=(len(y)) or ny
- 2D numpy.array:
pmf values at each x and y shape=(len(x),len(y))
-
fgivenx.drivers.
compute_samples
(f, x, samples, **kwargs)[source]¶ Apply the function(s) \(f(x;\theta)\) to the arrays defined in x and samples. Has options for weighting, trimming, cacheing & parallelising.
Additionally, if a list of log-evidences are passed, along with list of functions, samples and optional weights it marginalises over the models according to the evidences.
Parameters: - f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use. Default: None
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
Returns: - 2D numpy.array
Evaluate the function f at each x value and each theta. Equivalent to [[f(x_i,theta) for theta in samples] for x_i in x]
-
fgivenx.drivers.
plot_contours
(f, x, samples, ax=None, **kwargs)[source]¶ Plot the probability mass function given x at a range of \(y\) values for \(y = f(x|\theta)\)
\(P(y|x) = \int P(y=f(x;\theta)|x,\theta) P(\theta) d\theta\)
\(\mathrm{pmf}(y|x) = \int_{P(y'|x) < P(y|x)} P(y'|x) dy'\)
Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the probability mass function for all models marginalised according to the evidences.
Parameters: - f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
- ny: int, optional
Resolution of y axis. Default: 100
- y: array-like, optional
Explicit descriptor of y values to evaluate. Default: numpy.linspace(min(f), max(f), ny)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot()
.
Returns: - cbar: color bar
matplotlib.contour.QuadContourSet
Colors to create a global colour bar
-
fgivenx.drivers.
plot_dkl
(f, x, samples, prior_samples, ax=None, **kwargs)[source]¶ Plot the Kullback-Leibler divergence at each value of \(x\) for the prior and posterior defined by prior_samples and samples.
Let the posterior be:
\(P(y|x) = \int P(y=f(x;\theta)|x,\theta)P(\theta) d\theta\)
and the prior be:
\(Q(y|x) = \int P(y=f(x;\theta)|x,\theta)Q(\theta) d\theta\)
then the Kullback-Leibler divergence at each x is defined by
\(D_\mathrm{KL}(x)=\int P(y|x)\ln\left[\frac{Q(y|x)}{P(y|x)}\right]dy\)
Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the Kullback-Leibler divergence for all models marginalised according to the evidences.
Parameters: - f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples, prior_samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) from posterior and prior to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights, prior_weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache, prior_cache: str, optional
File roots for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot()
.
-
fgivenx.drivers.
plot_lines
(f, x, samples, ax=None, **kwargs)[source]¶ Plot a representative set of functions to sample
Additionally, if a list of log-evidences are passed, along with list of functions, and list of samples, this function plots the probability mass function for all models marginalised according to the evidences.
Parameters: - f: function
function \(f(x;\theta)\) (or list of functions for each model) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
\(\theta\) samples (or list of \(\theta\) samples) to evaluate \(f(x;\theta)\) at. shape = (nsamples, npars)
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- logZ: 1D array-like, optional
log-evidences of each model if multiple models are passed. Should be same length as the list f, and need not be normalised. Default: numpy.ones_like(f)
- weights: 1D array-like, optional
sample weights (or list of weights), if desired. Should have length same as samples.shape[0]. Default: numpy.ones_like(samples)
- ntrim: int, optional
Approximate number of samples to trim down to, if desired. Useful if the posterior is dramatically oversampled. Default: None
- cache: str, optional
File root for saving previous calculations for re-use
- parallel, tqdm_args:
see docstring for
fgivenx.parallel.parallel_apply()
- kwargs: further keyword arguments
Any further keyword arguments are plotting keywords that are passed to
fgivenx.plot.plot_lines()
.
fgivenx.dkl module¶
-
fgivenx.dkl.
DKL
(arrays)[source]¶ Compute the Kullback-Leibler divergence from one distribution Q to another P, where Q and P are represented by a set of samples.
Parameters: - arrays: tuple(1D numpy.array,1D numpy.array)
samples defining distributions P & Q respectively
Returns: - float:
Kullback Leibler divergence.
-
fgivenx.dkl.
compute_dkl
(fsamps, prior_fsamps, **kwargs)[source]¶ Compute the Kullback Leibler divergence for function samples for posterior and prior pre-calculated at a range of x values.
Parameters: - fsamps: 2D numpy.array
Posterior function samples, as computed by
fgivenx.compute_samples()
- prior_fsamps: 2D numpy.array
Prior function samples, as computed by
fgivenx.compute_samples()
- parallel, tqdm_kwargs: optional
see docstring for
fgivenx.parallel.parallel_apply()
.- cache: str, optional
File root for saving previous calculations for re-use.
Returns: - 1D numpy.array:
Kullback-Leibler divergences at each value of x. shape=(len(fsamps))
fgivenx.io module¶
-
class
fgivenx.io.
Cache
(file_root)[source]¶ Bases:
object
Cacheing tool for saving recomputation.
Parameters: - file_root: str
cached values are saved in file_root.pkl
Methods
check
(self, \*args)Check that the arguments haven’t changed since the last call. load
(self)Load cache from file using pickle. save
(self, \*args)Save cache to file using pickle. -
check
(self, *args)[source]¶ Check that the arguments haven’t changed since the last call.
Parameters: - *args:
All but the last argument are inputs to the cached function. The last is the actual value of the function.
Returns: - If arguments unchanged:
return the cached answer
- else:
indicate recomputation required by throwing a
CacheException
.
-
exception
fgivenx.io.
CacheChanged
(file_root)[source]¶ Bases:
fgivenx.io.CacheException
Exception to indicate the cache has changed.
-
exception
fgivenx.io.
CacheException
[source]¶ Bases:
exceptions.Exception
Base exception to indicate cache errors
-
exception
fgivenx.io.
CacheMissing
(file_root)[source]¶ Bases:
fgivenx.io.CacheException
Exception to indicate the cache does not exist.
-
exception
fgivenx.io.
CacheOK
(file_root)[source]¶ Bases:
fgivenx.io.CacheException
Exception to indicate the cache can be used.
fgivenx.mass module¶
Utilities for computing the probability mass function.
-
fgivenx.mass.
PMF
(samples, y)[source]¶ Compute the probability mass function.
The set of samples defines a probability density P(y), which is computed using a kernel density estimator.
From \(P(y)\) we define:
\(\mathrm{pmf}(p) = \int_{P(y)<p} P(y) dy\)
This is the cumulative distribution function expressed as a function of the probability
We aim to compute \(M(y)\), which indicates the amount of probability contained outside the iso-probability contour passing through \(y\):
^ P(y) ... | | . . | | . p|- - - - - - - - - - .+- - - - . - - - - - - - - - - - | .#| #. | .##| ##. | .##| ##. | .###| ###. M(p) | .###| ###. is the | .###| ###. shaded area | .####| ####. | .####| ####. | ..#####| #####.. | ....#######| #######.... | .###########| ###########. +---------------------+-------------------------------> y t ^ M(p) ^ M(y) | | 1| +++ 1| + | + | + + | ++++++++ | + + | ++ | ++ ++ | ++ | ++ ++ |+++ |+++ +++ +---------------------> p +---------------------> y 0
Parameters: - samples: array-like
Array of samples from a probability density P(y).
- y: array-like (optional)
Array to evaluate the PDF at
Returns: - 1D numpy.array:
PMF evaluated at each y value
-
fgivenx.mass.
compute_pmf
(fsamps, y, **kwargs)[source]¶ Compute the pmf defined by fsamps at each x for each y.
Parameters: - fsamps: 2D array-like
array of function samples, as returned by
fgivenx.compute_samples()
- y: 1D array-like
y values to evaluate the PMF at
- parallel, tqdm_kwargs: optional
see docstring for
fgivenx.parallel.parallel_apply()
.
Returns: - 2D numpy.array
probability mass function at each x for each y shape=(len(fsamps),len(y)
fgivenx.parallel module¶
-
fgivenx.parallel.
parallel_apply
(f, array, **kwargs)[source]¶ Apply a function to an array with openmp parallelisation.
Equivalent to [f(x) for x in array], but parallelised if required.
Parameters: - f: function
Univariate function to apply to each element of array
- array: array-like
Array to apply f to
- parallel: int or bool, optional
int > 0: number of processes to parallelise over
int < 0 or bool=True: use OMP_NUM_THREADS to choose parallelisation
bool=False or int=0: do not parallelise
- tqdm_kwargs: dict, optional
additional kwargs for tqdm progress bars.
- precurry: tuple, optional
immutable arguments to pass to f before x, i.e. [f(precurry,x) for x in array]
- postcurry: tuple, optional
immutable arguments to pass to f after x i.e. [f(x,postcurry) for x in array]
Returns: - list:
[f(precurry,x,postcurry) for x in array] parallelised according to parallel
fgivenx.plot module¶
-
fgivenx.plot.
plot
(x, y, z, ax=None, **kwargs)[source]¶ Plot iso-probability mass function, converted to sigmas.
Parameters: - x, y, z : numpy arrays
Same as arguments to
matplotlib.pyplot.contour()
- ax: axes object, optional
matplotlib.axes._subplots.AxesSubplot
to plot the contours onto. If unsupplied, thenmatplotlib.pyplot.gca()
is used to get the last axis used, or create a new one.- colors: color scheme, optional
matplotlib.colors.LinearSegmentedColormap
Color scheme to plot with. Recommend plotting in reverse (Default:matplotlib.pyplot.cm.Reds_r
)- smooth: float, optional
Percentage by which to smooth the contours. (Default: no smoothing)
- contour_line_levels: List[float], optional
Contour lines to be plotted. (Default: [1,2])
- linewidths: float, optional
Thickness of contour lines. (Default: 0.3)
- contour_color_levels: List[float], optional
Contour color levels. (Default: numpy.arange(0, contour_line_levels[-1] + 1, fineness))
- fineness: float, optional
Spacing of contour color levels. (Default: 0.1)
- lines: bool, optional
(Default: True)
- rasterize_contours: bool, optional
Rasterize the contours while keeping the lines, text etc in vector format. Useful for reducing file size bloat and making printing easier when you have dense contours. (Default: False)
Returns: - cbar: color bar
matplotlib.contour.QuadContourSet
Colors to create a global colour bar
-
fgivenx.plot.
plot_lines
(x, fsamps, ax=None, downsample=100, **kwargs)[source]¶ Plot function samples as a set of line plots.
Parameters: - x: 1D array-like
x values to plot
- fsamps: 2D array-like
set of functions to plot at each x. As returned by
fgivenx.compute_samples()
- ax: axes object
matplotlib.pyplot.ax
to plot on.- downsample: int, optional
Reduce the number of samples to a viewable quantity. (Default 100)
- any other keywords are passed to :meth:`matplotlib.pyplot.ax.plot`
fgivenx.samples module¶
-
fgivenx.samples.
compute_samples
(f, x, samples, **kwargs)[source]¶ Apply f(x,theta) to x array and theta in samples.
Parameters: - f: function
list of functions \(f(x;\theta)\) with dependent variable \(x\), parameterised by \(\theta\).
- x: 1D array-like
x values to evaluate \(f(x;\theta)\) at.
- samples: 2D array-like
list of theta samples to evaluate \(f(x;\theta)\) at. shape = (nfunc, nsamples, npars)
- parallel, tqdm_kwargs: optional
see docstring for
fgivenx.parallel.parallel_apply()
- cache: str, optional
File root for saving previous calculations for re-use default None
Returns: - 2D numpy.array:
samples at each x. shape=(len(x),len(samples),)
-
fgivenx.samples.
samples_from_getdist_chains
(params, file_root, latex=False, **kwargs)[source]¶ Extract samples and weights from getdist chains.
Parameters: - params: list(str)
Names of parameters to be supplied to second argument of f(x|theta).
- file_root: str, optional
Root name for getdist chains files. This variable automatically defines: - chains_file = file_root.txt - paramnames_file = file_root.paramnames but can be overidden by chains_file or paramnames_file.
- latex: bool, optional
Also return an array of latex strings for those paramnames.
- Any additional keyword arguments are forwarded onto getdist, e.g:
- samples_from_getdist_chains(params, file_root,
settings={‘ignore_rows’:0.5})
Returns: - samples: numpy.array
2D Array of samples. shape=(len(samples), len(params))
- weights: numpy.array
Array of weights. shape = (len(params),)
- latex: list(str), optional
list of latex strings for each parameter (if latex is provided as an argument)