Numerics

This page gives an overview of the functions which both run the importance sampling simulations and perform the data analysis.

The simulations are run using Importance Sampling Simulation, and is the most import function. This function acts as a interface between Python and background Cython code, as well as running the data analysis to create to create the returned probability density. This is the only function which the user needs to interact with directly: the others are used by Importance Sampling Simulation.

Importance Sampling Simulation

This is the main module of the PyFPT code, as it runs the simulations, post processes and exports the data ready for plotting.

numerics.is_simulation.is_simulation(drift, diffusion, x_in, x_end, num_runs, bias, time_step, bins=50, min_bin_size=400, num_sub_samples=20, estimator='lognormal', save_data=False, t_in=0.0, t_f=100, x_r=None, display=True)

Executes the simulation runs, then returns the histogram bin centres, heights and errors.

Parameters
  • drift (function) – The drift term of the simulated Langevin equation. Must take both x and t as arguments in the format (x, t).

  • diffusion (function) – The diffusion term of the simulated Langevin equation. Must take both x and t as arguments in the format (x, t).

  • x_in (float) – The initial position value.

  • x_end (float) – The end position value, i.e. the threshold which defines the FPT problem.

  • num_runs (int) – The number of simulation runs.

  • bias (scalar or function) – The bias used in the simulated Langevin equation to achieve importance sampling

    If a scalar (float or int), this the bias amplitude, i.e. a coefficient which multiplies the diffusion to define the bias.

    If a function, this simply defines the bias used. Must take arguments for both position and time in the format (x, t).

  • bins (int or sequence, optional) – If bins is an integer, it defines the number equal width bins for the first-passage times. If bins is a list or numpy array, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin. The widths can vary. Defaults to 50 evenly spaced bins.

  • time_step (float or int, optional) – The time step. This should be at least smaller than the standard deviation of the FPTs.

  • min_bin_size (int, optional) – The minimum number of runs per bin included in the data analysis. If a bin has less than this number, it is truncated. Defaults to 400.

  • estimator (string, optional) – The estimator used to reconstruct the target distribution probability density from the importance sample. If 'lognormal', it assumes the weights in each bin follow a lognomral distribution. If 'naive', no assumption is made but more runs are required for convergence.

  • num_sub_samples (int, optional) – The number of subsamples used in jackknife estimation of the errors used for the 'naive' estimator. Defaults to 20 when estimator is 'naive'.

  • Save_data (bool, optional) – If True, the first-passage times and the associated weights for each run is saved to a file.

  • t_in (float, optional) – The initial time value of simulation Defaults to 0.

  • t_f (float, optional) – The maximum FPT allowed per run. If this is exceeded, the simulation run ends and returns t_f, which can then be truncated. Defaults to 100.

  • x_r (float, optional) – The value of the reflective boundary. Must be compatible with the x_in and x_end chosen. Defaults to unreachable value, effectively no boundary.

  • display (bool, optional) – If True, p-value plots of both the real data, and the theoretical expectation if the underlying distribution is truly lognormal, are displayed using fpt.numerics.lognormality_check if a p-value is below the specified threshold.

Returns

  • bin_centres (list) – The centres of the histogram bins.

  • heights (list) – The heights of the normalised histogram bars.

  • errors (list) – The errors in estimating the heights.


Probability Density of the Data

This module post processes the first-passage times and weights to estimate the probability density of the target distribution.

numerics.data_points_pdf.data_points_pdf(data, weights, estimator, bins=50, min_bin_size=400, num_sub_samples=20, display=True)

Returns the (truncated) histogram bin centres, heights and errors, using the provided estimator method.

Parameters
  • data (numpy.ndarray) – Input first-passage time data.

  • weights (numpy.ndarray) – Associated weights to the first-passage time data. Must be a one-to-one correspondence between them.

  • estimator (string) – The estimator used to reconstruct the target distribution probability density from the importance sample. If 'lognormal', it assumes the weights in each bin follow a lognomral distribution. If 'naive', no assumption is made but more runs are required for convergence.

  • bins (int or list, optional) – If bins is an integer, it defines the number equal width bins for the first-passage times. If bins is a list or numpy array, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin. The widths can vary. Defaults to 50 evenly spaced bins.

  • min_bin_size (int, optional) – The minimum number of runs per bin included in the data analysis. If a bin has less than this number, it is truncated. Defaults to 400.

  • num_sub_samples (int, optional) – The number of subsamples used in jackknife estimation of the errors used for the 'naive' estimator. Defaults to 20 when estimator is 'naive'.

  • display (bool, optional) – If True, p-value plots of both the real data, and the theoretical expectation if the underlying distribution is truly lognormal, are displayed using fpt.numerics.lognormality_check if a p-value is below the specified threshold.

Returns

  • bin_centres (numpy.ndarray) – The centres of the histogram bins (after truncation of underfilled bins).

  • heights (numpy.ndarray) – The heights of the normalised histogram bars (after truncation of underfilled bins).

  • errors (numpy.ndarray) – The errors in estimating the heights (after truncation of underfilled bins).

  • num_runs_used (int) – The number of runs used (after truncation of underfilled bins).

  • bins (numpy.ndarray) – The untruncated bin edges.


Re-Processing

This module runs the same post-processing of data as the main simulation modules. It is intended to be used if the simulation is run directly or to re-analyse saved raw data.

numerics.re_processing.re_processing(data, weights=None, bins=50, min_bin_size=400, num_sub_samples=20, estimator='lognormal', t_f=100, display=True)

Runs the post-processing on the provided data and returns the histogram bin centres, heights and errors.

Parameters
  • data (list or numpy.ndarray) – Input first-passage time data.

  • weights (list or numpy.ndarray, optional) – Associated weights to the first-passage time data. Must be a one-to-one correspondence between them. Defaults to None.

  • bins (int or sequence, optional) – If bins is an integer, it defines the number equal width bins for the first-passage times. If bins is a list or numpy array, it defines the bin edges, including the left edge of the first bin and the right edge of the last bin. The widths can vary. Defaults to 50 evenly spaced bins.

  • min_bin_size (int, optional) – The minimum number of runs per bin included in the data analysis. If a bin has less than this number, it is truncated. Defaults to 400.

  • estimator (string, optional) – The estimator used to reconstruct the target distribution probability density from the importance sample. If 'lognormal', it assumes the weights in each bin follow a lognomral distribution. If 'naive', no assumption is made but more runs are required for convergence.

  • num_sub_samples (int, optional) – The number of subsamples used in jackknife estimation of the errors used for the 'naive' estimator. Defaults to 20 when estimator is 'naive'.

  • t_f (float, optional) – The maximum FPT allowed per run. If this is exceeded, the simulation run ends and returns t_f, which can then be truncated. Defaults to 100.

  • display (bool, optional) – If True, p-value plots of both the real data, and the theoretical expectation if the underlying distribution is truly lognormal, are displayed using fpt.numerics.lognormality_check if a p-value is below the specified threshold.

Returns

  • bin_centres (list) – The centres of the histogram bins.

  • heights (list) – The heights of the normalised histogram bars.

  • errors (list) – The errors in estimating the heights.


Histogram Normalisation

This module calculates the histogram normalisation using the formula num_runs*bin_width. Therefore, the total area of the histogram may not be 1. Instead, each bin is normalised.

numerics.histogram_normalisation.histogram_normalisation(bins, num_runs)

Returns histogram normalisation. If evenly spaced bins are used, then a scalar is returned. Otherwise, the correct normalisation for each bin is returned.

Parameters
  • bins (int or sequence of scalars) – Either the number of evenly spaced bins used in the histogram or the bin edges used if a sequence.

  • num_runs (int) – The number of simulation runs used in the histogram.

Returns

normalisation – If bins was an int, then the normalisation as a float is returned. If bins was a sequence, then the normalisation per bin is returned.

Return type

float or sequence of scalars


Data in Histogram Bins

This module subdivides the first-passage time data and its associated weights according to the first-passage time bins used in the estimation of the target distribution.

numerics.data_in_histogram_bins.data_in_histogram_bins(data, weights, bin_edges)

Returns first-passage time data and the associated weights in columns corresponding to the provided bins edges. The number of rows corresponds to the largest bin, with empty elements filled with zeros.

Parameters
  • data (numpy.ndarray) – Input first-passage time data.

  • weights (numpy.ndarray) – Associated weights to the first-passage time data. Must be a one-to-one correspondence between them.

  • bin_edges (sequence of scalars) – The bin edges of the histogram used in the estimation of the target distribution.

Returns

  • data_columned (numpy.ndarray) – The data separated into columns, with the data in each column corresponding to a particular bin.

  • weights_columned (numpy.ndarray) – The weights separated into columns, with the weights in each column corresponding to the associated data in a particular bin.


Histogram Data Truncation

This module truncates the first-passage time data (and its associated weights) above the specified threshold. The main purpose is to truncate runs which exceeded the maximum time.

numerics.histogram_data_truncation.histogram_data_truncation(data, threshold, weights=0, num_sub_samples=None)

Returns truncated first-passage time data and the associated weights if provided.

Parameters
  • data (numpy.ndarray) – Input first-passage time data.

  • threshold (scalar) – data below the threshold will be kept, above it will be truncated.

  • weights (numpy.ndarray, optional) – Associated weights to the first-passage time data. Must be a one-to-one correspondence between them.

  • num_sub_samples (int, optional) – The number of subsamples if the naive estimator error estimation is used. This means the truncated data will always be an integer multiple of num_sub_samples, such that jackknife resampling can be done.

  • ——-

  • truncated_data (numpy.ndarray) – The truncated data.

  • truncated_data (numpy.ndarray, optional) – The truncated weights, if provided.


Jackknife Errors

This module calculates the errors of the histogram bars by using a simplified jackknife resampling method. The data is sub-sampled into many histograms with the same bins. This way a distribution of heights, for each bin, can be made. The standard deviation of the distribution of heights for as bin, by the central limit theorem, then gives error when divided root of the number sub samples.

numerics.jackknife_errors.jackknife_errors(data_input, weights_input, bins, num_sub_samps)

Returns the jackknife resampling errors for the estimation of histogram bar height, for the provided weighted data and bin edges.

Parameters
  • data (numpy.ndarray) – Input first-passage time data.

  • weights (numpy.ndarray) – Associated weights to the first-passage time data. Must be a one-to-one correspondence between them.

  • bins (sequence) – Defines the bin edges of the histogram, including the left edge of the first bin and the right edge of the last bin. The widths can vary.

  • num_sub_samples (int) – The number of subsamples used in jackknife estimation of the errors used for the 'naive' estimator. Must divide into number of data points with no remainder.

  • ——-

  • errors (numpy.ndarray) – The jackknife errors.


Save Data to File

This module saves the raw first-passage time data and its associated weights to a comma separated value file using pandas in the same directory as where PyFPT is run from.

numerics.save_data_to_file.save_data_to_file(data, weights, x_in, num_runs, bias, extra_label=None)

Saves the provided data and the associated weights to a file, titled “IS_data_x_in_<x_in>_iterations_<num_runs>_bias_<bias>(<extra_label>).csv” The first-passage time data is stored as ‘FPTs’ and the associated weights as ‘ws’/

Parameters
  • data (numpy.ndarray) – Input first-passage time data.

  • weights (numpy.ndarray) – Associated weights to the first-passage time data. Must be a one-to-one correspondance between them.

  • x_in (float) – The initial position value.

  • num_runs (int) – The number of simulation runs.

  • bias (float) – The coefficient of the diffusion used define the bias.

  • extra_label (string, optional) – Optional extra string to label file.


Multiprocessing Error

This module alerts the user to a possible multiprocessing error. This occurs when the data from different cores is incorrectly combined, with weights not corresponding to the data.

numerics.multi_processing_error.multi_processing_error(data, weights)

Alerts the user to possible multiprocessing error if the data is and log of the weights is sufficiently uncorrelated.

Parameters
  • data (numpy.ndarray) – Input first-passage time data.

  • weights (numpy.ndarray) – Associated weights to the first-passage time data. Must be a one-to-one correspondence between them.


Lognormal Error

This module calculates the unnormalized error of the estimation of the probability density of the target distribution from the sample distribution using the lognormal method. This calculation is taken from Zhou–Gao 1997.

numerics.log_normal_error.log_normal_error(weights, z_alpha=1)

Returns the unnormalized errors on the estimation of the probability density using the lognormal method.

Parameters
  • weights (numpy.ndarray) – Associated weights to the first-passage time data. Must be a one-to-one correspondence between them. bin edges used if a sequence.

  • num_runs (int or float, optional) – The numer of standard errors in the returned quantity. Defaults to 1 standard error.

Returns

errors – The asymmetric error, as lower and upper bounds.

Return type

numpy.ndarray


Lognormal Height

This module estimates the unnormalized histogram bar height for a bin of first-passage times of weighted data, assuming the weights are drawn from an underlying lognormal distribution.

numerics.log_normal_height.log_normal_height(weights)

Returns the unnormalized height of the histogram bar.

Parameters

weights (numpy.ndarray) – The distribution of weights whose height is desired.

Returns

height – Lognormal estimate for the histogram bar height of this bin.

Return type

float


Lognormal Mean

This module estimates the mean of a lognormal distribution using the maximum likelihood method from Shen–Brown–Zhi 2006.

numerics.log_normal_mean.log_normal_mean(weights)

Returns the mean of the lognormal distribution

Parameters

weights (numpy.ndarray) – The distribution of weights whose mean is desired.

Returns

mean – The lognormal estimate for the mean of the provided array.

Return type

float


Lognormality Check

This module checks if assuming the weights within each first-passage time bin are drawn from an underlying lognormal distribution is correct, by calculating the p-values using D’Agostino & Pearson’s method. If any p-value is below the 0.5% threshold, plots comparing the p-values of the data and the theoretical predictions are given. If many p-values are less than 0.5%, or some are much, much less than this value, it is likely the assumption is incorrect.

numerics.lognormality_check.lognormality_check(bin_centres, weights_in_bins, display=True)

Checks if the distribution of weights within each first-passage time bin is drawn from an underlying lognormal distribution.

Parameters
  • bin_centres (numpy.ndarray) – The centres of the histogram bins.

  • weights_columned (numpy.ndarray) – The associated weights to these bins separated into columns, with each column containing the weights of that bin.

  • display (bool, optional) – If True, p-value plots of both the real data, and the theoretical expectation if the underlying distribution is truly lognormal, are displayed.