
2023/07
Parameter inference for degenerate diffusion processes
Yuga Iguchi, Alexandros Beskos and Matthew M. Graham
We study parametric inference for ergodic diffusion processes with a degenerate diffusion matrix. Existing research focuses on a particular class of hypoelliptic SDEs, with components split into `rough'/`smooth' and noise from rough components propagating directly onto smooth ones, but some critical model classes arising in applications have yet to be explored. We aim to cover this gap, thus analyse the highly degenerate class of SDEs, where components split into further subgroups. Such models include e.g. the notable case of generalised Langevin equations. We propose a tailored timediscretisation scheme and provide asymptotic results supporting our scheme in the context of highfrequency, full observations. The proposed discretisation scheme is applicable in much more general data regimes and is shown to overcome biases via simulation studies also in the practical case when only a smooth component is observed. Joint consideration of our study for highly degenerate SDEs and existing research provides a general `recipe' for the development of timediscretisation schemes to be used within statistical methods for general classes of hypoelliptic SDEs.

2019/06
A scalable optimaltransport based local particle filter
Matthew M. Graham and Alexandre H. Thiery
Filtering in spatiallyextended dynamical systems is a challenging problem with significant practical applications such as numerical weather prediction. Particle filters allow asymptotically consistent inference but require infeasibly large ensemble sizes for accurate estimates in complex spatial models. Localisation approaches, which perform local state updates by exploiting low dependence between variables at distant points, have been suggested as a potential resolution to this issue. Naively applying the resampling step of the particle filter locally however produces implausible spatially discontinuous states. The ensemble transform particle filter replaces resampling with an optimaltransport map and can be localised by computing maps for every spatial mesh node. The resulting local ensemble transport particle filter is however computationally intensive for dense meshes. We propose a new optimaltransport based local particle filter which computes a fixed number of maps independent of the mesh resolution and interpolates these maps across space, reducing the computation required and allowing it to be ensured particles remain spatially smooth. We numerically illustrate that, at a reduced computational cost, we are able to achieve the same accuracy as the local ensemble transport particle filter, and retain its improved robustness to nonGaussianity and ability to quantify uncertainty when compared to local ensemble Kalman filters.

(inpress)
Parameter estimation with increased precision for elliptic and hypoelliptic diffusions
Yuga Iguchi, Alexandros Beskos and Matthew M. Graham
Bernoulli
This work aims at making a comprehensive contribution in the general area of parametric inference for discretely observed diffusion processes. Established approaches for likelihoodbased estimation invoke a timediscretisation scheme for the approximation of the intractable transition dynamics of the Stochastic Differential Equation (SDE) model over finite time periods. The scheme is applied for a stepsize that is either userselected or determined by the data. Recent research has highlighted the critical effect of the choice of numerical scheme on the behaviour of derived parameter estimates in the setting of hypoelliptic SDEs. In brief, in our work, first, we develop two weak second order sampling schemes (to cover both hypoelliptic and elliptic SDEs) and produce a small time expansion for the density of the schemes to form a proxy for the true intractable SDE transition density. Then, we establish a collection of analytic results for likelihoodbased parameter estimates obtained via the formed proxies, thus providing a theoretical framework that showcases advantages from the use of the developed methodology for SDE calibration. We present numerical results from carrying out classical or Bayesian inference, for both elliptic and hypoelliptic SDEs.

2024/04
ParticleDA.jl v.1.0: a distributed particlefiltering data assimilation package
Daniel Giles, Matthew M. Graham, Mosè Giordano, Tuomas Koskela, Alexandros Beskos and Serge Guillas
Geoscientific Model Development
Digital twins of physical and human systems informed by realtime data are becoming ubiquitous across weather forecasting, disaster preparedness, and urban planning, but researchers lack the tools to run these models effectively and efficiently, limiting progress. One of the current challenges is to assimilate observations in highly nonlinear dynamical systems, as the practical need is often to detect abrupt changes. We have developed a software platform to improve the use of realtime data in nonlinear system representations where nonGaussianity limits the applicability of data assimilation algorithms such as the ensemble Kalman filter and variational methods. Particlefilterbased data assimilation algorithms have been implemented within a userfriendly opensource software platform in Julia – ParticleDA.jl. To ensure the applicability of the developed platform in realistic scenarios, emphasis has been placed on numerical efficiency and scalability on highperformance computing systems. Furthermore, the platform has been developed to be forwardmodel agnostic, ensuring that it is applicable to a wide range of modelling settings, for instance unstructured and nonuniform meshes in the spatial domain or even state spaces that are not spatially organized. Applications to tsunami and numerical weather prediction demonstrate the computational benefits and ease of using the highlevel Julia interface with the package to perform filtering in a variety of complex models.

2023/04
Manifold lifting: scaling Markov chain Monte Carlo to the vanishing noise regime
Khai Xiang Au, Matthew M. Graham and Alexandre H. Thiery
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
Standard Markov chain Monte Carlo methods struggle to explore distributions that concentrate in the neighbourhood of lowdimensional submanifolds. This pathology naturally occurs in Bayesian inference settings when there is a high signaltonoise ratio in the observational data but the model is inherently overparametrised or nonidentifiable. In this paper, we propose a strategy that transforms the original sampling problem into the task of exploring a distribution supported on a manifold embedded in a higherdimensional space; in contrast to the original posterior this lifted distribution remains diffuse in the limit of vanishing observation noise. We employ a constrained Hamiltonian Monte Carlo method, which exploits the geometry of this lifted distribution, to perform efficient approximate inference. We demonstrate in numerical experiments that, contrarily to competing approaches, the sampling efficiency of our proposed methodology does not degenerate as the target distribution to be explored concentrates near lowdimensional submanifolds.

2022/08
Testing whether a learning procedure is calibrated
Jon Cockayne, Matthew M. Graham, Chris Oates, Tim J. Sullivan and Onyur Teymur
Journal of Machine Learning Research
A learning procedure takes as input a dataset and performs inference for the parameters θ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about θ after seeing the dataset. Bayesian inference is a prime example of such a procedure, but one can also construct other learning procedures that return distributional output. This paper studies conditions for a learning procedure to be considered calibrated, in the sense that the true datagenerating parameters are plausible as samples from its distributional output. A learning procedure whose inferences and predictions are systematically over or underconfident will fail to be calibrated. On the other hand, a learning procedure that is calibrated need not be statistically efficient. A hypothesistesting framework is developed in order to assess, using simulation, whether a learning procedure is calibrated. Several vignettes are presented to illustrate different aspects of the framework

2022/04
Manifold Markov chain Monte Carlo methods for Bayesian inference in diffusion models
Matthew M. Graham, Alexandre H. Thiery and Alexandros Beskos
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
Bayesian inference for nonlinear diffusions, observed at discrete times, is a challenging task that has prompted the development of a number of algorithms, mainly within the computational statistics community. We propose a new direction, and accompanying methodology—borrowing ideas from statistical physics and computational chemistry—for inferring the posterior distribution of latent diffusion paths and model parameters, given observations of the process. Joint configurations of the underlying process noise and of parameters, mapping onto diffusion paths consistent with observations, form an implicitly defined manifold. Then, by making use of a constrained Hamiltonian Monte Carlo algorithm on the embedded manifold, we are able to perform computationally efficient inference for a class of discretely observed diffusion models. Critically, in contrast with other approaches proposed in the literature, our methodology is highly automated, requiring minimal user intervention and applying alike in a range of settings, including: elliptic or hypoelliptic systems; observations with or without noise; linear or nonlinear observation operators. Exploiting Markovianity, we propose a variant of the method with complexity that scales linearly in the resolution of path discretisation and the number of observation times. Python code reproducing the results is available at http://doi.org/10.5281/zenodo.5796148.

2017/12
Asymptotically exact inference in differentiable generative models
Matthew M. Graham and Amos J. Storkey
Electronic Journal of Statistics
Many generative models can be expressed as a differentiable function applied to input variables sampled from a known probability distribution. This framework includes both the generative component of learned parametric models such as variational autoencoders and generative adversarial networks, and also procedurally defined simulator models which involve only differentiable operations. Though the distribution on the input variables to such models is known, often the distribution on the output variables is only implicitly defined. We present a method for performing efficient Markov chain Monte Carlo inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where approximate Bayesian computation might otherwise be employed. We use the intuition that computing conditional expectations is equivalent to integrating over a density defined on the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to move between inputs exactly consistent with observations. We validate the method by performing inference experiments in a diverse set of models.

2021/04
Measure transport with kernel Stein discrepancy
Matthew A. Fisher, Tui Nolan, Matthew M. Graham, Dennis Prangle and Chris Oates
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics
Measure transport underpins several recent algorithms for posterior approximation in the Bayesian context, wherein a transport map is sought to minimise the KullbackLeibler divergence (KLD) from the posterior to the approximation. The KLD is a strong mode of convergence, requiring absolute continuity of measures and placing restrictions on which transport maps can be permitted. Here we propose to minimise a kernel Stein discrepancy (KSD) instead, requiring only that the set of transport maps is dense in an L2 sense and demonstrating how this condition can be validated. The consistency of the associated posterior approximation is established and empirical results suggest that KSD is competitive and more flexible alternative to KLD for measure transport.

2017/08
Continuously tempered Hamiltonian Monte Carlo
Matthew M. Graham and Amos J. Storkey
Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence
Hamiltonian Monte Carlo (HMC) is a powerful Markov chain Monte Carlo (MCMC) method for performing approximate inference in complex probabilistic models of continuous variables. In common with many MCMC methods, however, the standard HMC approach performs poorly in distributions with multiple isolated modes. We present a method for augmenting the Hamiltonian system with an extra continuous temperature control variable which allows the dynamic to bridge between sampling a complex target distribution and a simpler unimodal base distribution. This augmentation both helps improve mixing in multimodal targets and allows the normalisation constant of the target distribution to be estimated. The method is simple to implement within existing HMC code, requiring only a standard leapfrog integrator. We demonstrate experimentally that the method is competitive with annealed importance sampling and simulating tempering methods at sampling from challenging multimodal distributions and estimating their normalising constants.

2017/04
Asymptotically exact inference in differentiable generative models
Matthew M. Graham and Amos J. Storkey
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics
Many generative models can be expressed as a differentiable function of random inputs drawn from some simple probability density. This framework includes both deep generative architectures such as Variational Autoencoders and a large class of procedurally defined simulator models. We present a method for performing efficient MCMC inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where Approximate Bayesian Computation might otherwise be employed. We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to coherently move between inputs exactly consistent with observations. We validate the method by performing inference tasks in a diverse set of models.

2016/05 PseudoMarginal Slice Sampling
Iain Murray and Matthew M. Graham
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
Markov chain Monte Carlo (MCMC) methods asymptotically sample from complex probability distributions.
The pseudomarginal MCMC framework only requires an unbiased estimator of the unnormalized probability
distribution function to construct a Markov chain. However, the resulting chains are harder to tune
to a target distribution than conventional MCMC, and the types of updates available are limited.
We describe a general way to clamp and update the random numbers used in a pseudomarginal method's
unbiased estimator. In this framework we can use slice sampling and other adaptive methods. We obtain
more robust Markov chains, which often mix more quickly.

2018/07
Auxiliary variable Markov chain Monte Carlo methods
Matthew M. Graham
PhD thesis, University of Edinburgh
Markov chain Monte Carlo (MCMC) methods are a widely applicable class of algorithms for estimating integrals in statistical inference problems. A common approach in MCMC methods is to introduce additional auxiliary variables into the Markov chain state and perform transitions in the joint space of target and auxiliary variables. In this thesis we consider novel methods for using auxiliary variables within MCMC methods to allow approximate inference in otherwise intractable models and to improve sampling performance in models exhibiting challenging properties such as multimodality. We first consider the pseudomarginal framework. This extends the Metropolis–Hastings algorithm to cases where we only have access to an unbiased estimator of the density of target distribution. The resulting chains can sometimes show ‘sticking’ behaviour where long series of proposed updates are rejected. Further the algorithms can be difficult to tune and it is not immediately clear how to generalise the approach to alternative transition operators. We show that if the auxiliary variables used in the density estimator are included in the chain state it is possible to use new transition operators such as those based on slicesampling algorithms within a pseudomarginal setting. This auxiliary pseudomarginal approach leads to easier to tune methods and is often able to improve sampling efficiency over existing approaches. As a second contribution we consider inference in probabilistic models defined via a generative process with the probability density of the outputs of this process only implicitly defined. The approximate Bayesian computation (ABC) framework allows inference in such models when conditioning on the values of observed model variables by making the approximation that generated observed variables are ‘close’ rather than exactly equal to observed data. Although making the inference problem more tractable, the approximation error introduced in ABC methods can be difficult to quantify and standard algorithms tend to perform poorly when conditioning on high dimensional observations. This often requires further approximation by reducing the observations to lower dimensional summary statistics. We show how including all of the random variables used in generating model outputs as auxiliary variables in a Markov chain state can allow the use of more efficient and robust MCMC methods such as slice sampling and Hamiltonian Monte Carlo (HMC) within an ABC framework. In some cases this can allow inference when conditioning on the full set of observed values when standard ABC methods require reduction to lower dimensional summaries for tractability. Further we introduce a novel constrained HMC method for performing inference in a restricted class of differentiable generative models which allows conditioning the generated observed variables to be arbitrarily close to observed data while maintaining computational tractability. As a final topicwe consider the use of an auxiliary temperature variable in MCMC methods to improve exploration of multimodal target densities and allow estimation of normalising constants. Existing approaches such as simulated tempering and annealed importance sampling use temperature variables which take on only a discrete set of values. The performance of these methods can be sensitive to the number and spacing of the temperature values used, and the discrete nature of the temperature variable prevents the use of gradientbased methods such as HMC to update the temperature alongside the target variables. We introduce new MCMC methods which instead use a continuous temperature variable. This both removes the need to tune the choice of discrete temperature values and allows the temperature variable to be updated jointly with the target variables within a HMC method.

2013/07
Insect olfactory landmark navigation
Matthew M. Graham
MSc by Research dissertation, University of Edinburgh
The natural world is full of chemical signals  organisms of all scales and taxonomic classifications transmit and receive chemical signals to guide the full gamut of life’s processes: from helping forming motherinfant bonds, to identifying potential mates and even signalling their own deaths. Insects are particularly reliant on chemical cues to guide their behaviour and understanding how insects respond to and use chemical cues in their environment is a high active research area. In a series of recent studies Steck et al. produced evidence that foragers of the Saharan desert ant species Cataglyphis fortis are able to learn an association between an array of odour sources arranged around the entrance to their nest and the relative location of the nest entrance and later use the information they receive from the odour sources to help them navigate to the visually inconspicious nest entrance. This ability to use odour sources as olfactory landmarks had not been previously seen experimentally in insects, and is a remarkable behaviour given the extremely complex and highly dynamic nature of the olfactory signals received by the ants from the turbulent odour plumes the chemicals travel in from the sources. After an introductory chapter covering some relevant background theory to the work in this project, the second chapter of this dissertation will detail a field study conducted with the European desert ant species Cataglyphis velox. As in the studies of Steck et al. the ants were constrained to moving a linear channel and so the navigation task limited to being onedimensional, the aim of this study was to see if there was any evidence supporting the hypothesis that Cataglyphis velox ants are able to use olfactory landmarks to navigate in a more realistic open environment. The results of the study were inconclusive, due to the low sample sizes that were collected and small effect size in the study design used, however it is proposed that the study could be considered usefully as pilot for a full study at a later date, and an adjusted study design is proposed that might overcome a lot of the issues encountered in the current study. In the third and final chapter of this dissertation, a modelling study of what information is available in the olfactory signal received from a turbulent odour plume about the location of the source of that plume is presented, with this work aiming to explore the information which may being used by Cataglyphis desert ants when using olfactory landmarks to navigate. The details of the plume and olfactory sensor models used are described and the results of an analysis of the estimated mutual information between the modelled olfactory signals and the location of odour source presented. It is found that the locational informational content of individual signal segment statistics seems to be low, though combining multiple statistics does potentially allow more useful reductions in uncertainty.

2012/06
Measuring tissue stiffness with ultrasound
Matthew M. Graham
MEng project report, University of Cambridge