Introduction

I am a research software engineer in the research software development group at UCL. Prior to my current role, I did postdocs with Chris Oates at the School of Mathematics, Statistics and Physics at Newcastle University and Alex Thiery at the Department of Statistics and Applied Probability in the National University of Singapore. I completed my PhD at the School of Informatics in the University of Edinburgh, where I was supervised by Amos Storkey.

Research interests: Markov chain Monte Carlo methods, Hamiltonian Monte Carlo, approximate Bayesian computation, data assimilation, inverse problems.

Contact

Address
UCL Research IT Services, 1–19 Torrington Place, London, WC1E 7HB

Publications

Pre-prints

  • 2020/12 Testing whether a learning procedure is calibrated

    Jon Cockayne, Matthew M. Graham, Chris Oates and Tim J. Sullivan

    arXiv
    A learning procedure takes as input a dataset and performs inference for the parameters θ of a model that is assumed to have given rise to the dataset. Here we consider learning procedures whose output is a probability distribution, representing uncertainty about θ after seeing the dataset. Bayesian inference is a prime example of such a procedure but one can also construct other learning procedures that return distributional output. This paper studies conditions for a learning procedure to be considered calibrated, in the sense that the true data-generating parameters are plausible as samples from its distributional output. A learning procedure that is calibrated need not be statistically efficient and vice versa. A hypothesis-testing framework is developed in order to assess, using simulation, whether a learning procedure is calibrated. Finally, we exploit our framework to test the calibration of some learning procedures that are motivated as being approximations to Bayesian inference but are nevertheless widely used.
  • 2020/03 Manifold lifting: scaling MCMC to the vanishing noise regime

    Khai Xiang Au, Matthew M. Graham and Alexandre H. Thiery

    Standard Markov chain Monte Carlo methods struggle to explore distributions that are concentrated in the neighbourhood of low-dimensional structures. These pathologies naturally occur in a number of situations. For example, they are common to Bayesian inverse problem modelling and Bayesian neural networks, when observational data are highly informative, or when a subset of the statistical parameters of interest are non-identifiable. In this paper, we propose a strategy that transforms the original sampling problem into the task of exploring a distribution supported on a manifold embedded in a higher dimensional space; in contrast to the original posterior this lifted distribution remains diffuse in the vanishing noise limit. We employ a constrained Hamiltonian Monte Carlo method which exploits the manifold geometry of this lifted distribution, to perform efficient approximate inference. We demonstrate in several numerical experiments that, contrarily to competing approaches, the sampling efficiency of our proposed methodology does not degenerate as the target distribution to be explored concentrates near low dimensional structures.
  • 2019/12 Manifold Markov chain Monte Carlo methods for Bayesian inference in a wide class of diffusion models

    Matthew M. Graham, Alexandre H. Thiery and Alexandros Beskos

    Bayesian inference for partially observed, nonlinear diffusion models is a challenging task that has led to the development of several important methodological advances. We propose a novel framework for inferring the posterior distribution on both a time discretisation of the diffusion process and any unknown model parameters, given partial observations of the process. The set of joint configurations of the noise increments and parameters which map to diffusion paths consistent with the observations form an implicitly defined manifold. By using a constrained Hamiltonian Monte Carlo algorithm for constructing Markov kernels on embedded manifolds, we are able to perform computationally efficient inference in a wide class of partially observed diffusions. Unlike other approaches in the literature, that are often limited to specific model classes, our approach allows full generality in the choice of observation and diffusion models, including complex cases such as hypoelliptic systems with degenerate diffusion coefficients. By exploiting the Markovian structure of diffusions, we propose a variant of the approach with a complexity that scales linearly in the time resolution of the discretisation and quasi-linearly in the number of observation times.
  • 2019/06 A scalable optimal-transport based local particle filter

    Matthew M. Graham and Alexandre H. Thiery

    Filtering in spatially-extended dynamical systems is a challenging problem with significant practical applications such as numerical weather prediction. Particle filters allow asymptotically consistent inference but require infeasibly large ensemble sizes for accurate estimates in complex spatial models. Localisation approaches, which perform local state updates by exploiting low dependence between variables at distant points, have been suggested as a potential resolution to this issue. Naively applying the resampling step of the particle filter locally however produces implausible spatially discontinuous states. The ensemble transform particle filter replaces resampling with an optimal-transport map and can be localised by computing maps for every spatial mesh node. The resulting local ensemble transport particle filter is however computationally intensive for dense meshes. We propose a new optimal-transport based local particle filter which computes a fixed number of maps independent of the mesh resolution and interpolates these maps across space, reducing the computation required and allowing it to be ensured particles remain spatially smooth. We numerically illustrate that, at a reduced computational cost, we are able to achieve the same accuracy as the local ensemble transport particle filter, and retain its improved robustness to non-Gaussianity and ability to quantify uncertainty when compared to local ensemble Kalman filters.

Journal articles

  • 2017/12 Asymptotically exact inference in differentiable generative models

    Matthew M. Graham and Amos J. Storkey

    Electronic Journal of Statistics

    Many generative models can be expressed as a differentiable function applied to input variables sampled from a known probability distribution. This framework includes both the generative component of learned parametric models such as variational autoencoders and generative adversarial networks, and also procedurally defined simulator models which involve only differentiable operations. Though the distribution on the input variables to such models is known, often the distribution on the output variables is only implicitly defined. We present a method for performing efficient Markov chain Monte Carlo inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where approximate Bayesian computation might otherwise be employed. We use the intuition that computing conditional expectations is equivalent to integrating over a density defined on the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to move between inputs exactly consistent with observations. We validate the method by performing inference experiments in a diverse set of models.

Conference proceedings

  • 2021/04 Measure transport with kernel Stein discrepancy

    Matthew A. Fisher, Tui Nolan, Matthew M. Graham, Dennis Prangle and Chris Oates

    Proceedings of the 24th International Conference on Artificial Intelligence and Statistics

    Measure transport underpins several recent algorithms for posterior approximation in the Bayesian context, wherein a transport map is sought to minimise the Kullback-Leibler divergence (KLD) from the posterior to the approximation. The KLD is a strong mode of convergence, requiring absolute continuity of measures and placing restrictions on which transport maps can be permitted. Here we propose to minimise a kernel Stein discrepancy (KSD) instead, requiring only that the set of transport maps is dense in an L2 sense and demonstrating how this condition can be validated. The consistency of the associated posterior approximation is established and empirical results suggest that KSD is competitive and more flexible alternative to KLD for measure transport.
  • 2017/08 Continuously tempered Hamiltonian Monte Carlo

    Matthew M. Graham and Amos J. Storkey

    Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence

    Hamiltonian Monte Carlo (HMC) is a powerful Markov chain Monte Carlo (MCMC) method for performing approximate inference in complex probabilistic models of continuous variables. In common with many MCMC methods, however, the standard HMC approach performs poorly in distributions with multiple isolated modes. We present a method for augmenting the Hamiltonian system with an extra continuous temperature control variable which allows the dynamic to bridge between sampling a complex target distribution and a simpler unimodal base distribution. This augmentation both helps improve mixing in multimodal targets and allows the normalisation constant of the target distribution to be estimated. The method is simple to implement within existing HMC code, requiring only a standard leapfrog integrator. We demonstrate experimentally that the method is competitive with annealed importance sampling and simulating tempering methods at sampling from challenging multimodal distributions and estimating their normalising constants.
  • 2017/04 Asymptotically exact inference in differentiable generative models

    Matthew M. Graham and Amos J. Storkey

    Proceedings of the 20th International Conference on Artificial Intelligence and Statistics

    Many generative models can be expressed as a differentiable function of random inputs drawn from some simple probability density. This framework includes both deep generative architectures such as Variational Autoencoders and a large class of procedurally defined simulator models. We present a method for performing efficient MCMC inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where Approximate Bayesian Computation might otherwise be employed. We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to coherently move between inputs exactly consistent with observations. We validate the method by performing inference tasks in a diverse set of models.
  • 2016/05 Pseudo-Marginal Slice Sampling

    Iain Murray and Matthew M. Graham

    Proceedings of the 19th International Conference on Artificial Intelligence and Statistics

    Markov chain Monte Carlo (MCMC) methods asymptotically sample from complex probability distributions. The pseudo-marginal MCMC framework only requires an unbiased estimator of the unnormalized probability distribution function to construct a Markov chain. However, the resulting chains are harder to tune to a target distribution than conventional MCMC, and the types of updates available are limited. We describe a general way to clamp and update the random numbers used in a pseudo-marginal method's unbiased estimator. In this framework we can use slice sampling and other adaptive methods. We obtain more robust Markov chains, which often mix more quickly.

Workshop papers

  • 2017/08 Inference in differentiable generative models

    Matthew M. Graham and Amos J. Storkey

    ICML 2017 workshop: Implicit generative models

    Many generative models can be expressed as a differentiable function of random inputs drawn from a known probability distribution. This framework includes both learnt parametric generative models and a large class of procedurally defined simulator models. We present a method for performing efficient Markov chain Monte Carlo (MCMC) inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where Approximate Bayesian Computation might otherwise be employed. We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to move between inputs exactly consistent with observations.
  • 2016/12 Continuously tempered Hamiltonian Monte Carlo

    Matthew M. Graham and Amos J. Storkey

    NIPS 2016 workshop: Advances in Approximate Bayesian Inference

    Hamiltonian Monte Carlo (HMC) is a powerful Markov chain Monte Carlo (MCMC) method for performing approximate inference in complex probabilistic models of continuous variables. In common with many MCMC methods however the standard HMC approach performs poorly in distributions with multiple isolated modes. Based on an approach proposed in the statistical physics literature, we present a method for augmenting the Hamiltonian system with an extra continuous temperature control variable which allows the dynamic to bridge between sampling a complex target distribution and a simpler uni-modal base distribution. This augmentation both helps increase mode-hopping in multi-modal targets and allows the normalisation constant of the target distribution to be estimated. The method is simple to implement within existing HMC code, requiring only a standard leapfrog integrator. It produces MCMC samples from the target distribution which can be used to directly estimate expectations without any importance re-weighting.

Theses and dissertations

  • 2018/07 Auxiliary variable Markov chain Monte Carlo methods

    Matthew M. Graham

    PhD thesis, University of Edinburgh

    Markov chain Monte Carlo (MCMC) methods are a widely applicable class of algorithms for estimating integrals in statistical inference problems. A common approach in MCMC methods is to introduce additional auxiliary variables into the Markov chain state and perform transitions in the joint space of target and auxiliary variables. In this thesis we consider novel methods for using auxiliary variables within MCMC methods to allow approximate inference in otherwise intractable models and to improve sampling performance in models exhibiting challenging properties such as multimodality. We first consider the pseudo-marginal framework. This extends the Metropolis–Hastings algorithm to cases where we only have access to an unbiased estimator of the density of target distribution. The resulting chains can sometimes show ‘sticking’ behaviour where long series of proposed updates are rejected. Further the algorithms can be difficult to tune and it is not immediately clear how to generalise the approach to alternative transition operators. We show that if the auxiliary variables used in the density estimator are included in the chain state it is possible to use new transition operators such as those based on slice-sampling algorithms within a pseudo-marginal setting. This auxiliary pseudo-marginal approach leads to easier to tune methods and is often able to improve sampling efficiency over existing approaches. As a second contribution we consider inference in probabilistic models defined via a generative process with the probability density of the outputs of this process only implicitly defined. The approximate Bayesian computation (ABC) framework allows inference in such models when conditioning on the values of observed model variables by making the approximation that generated observed variables are ‘close’ rather than exactly equal to observed data. Although making the inference problem more tractable, the approximation error introduced in ABC methods can be difficult to quantify and standard algorithms tend to perform poorly when conditioning on high dimensional observations. This often requires further approximation by reducing the observations to lower dimensional summary statistics. We show how including all of the random variables used in generating model outputs as auxiliary variables in a Markov chain state can allow the use of more efficient and robust MCMC methods such as slice sampling and Hamiltonian Monte Carlo (HMC) within an ABC framework. In some cases this can allow inference when conditioning on the full set of observed values when standard ABC methods require reduction to lower dimensional summaries for tractability. Further we introduce a novel constrained HMC method for performing inference in a restricted class of differentiable generative models which allows conditioning the generated observed variables to be arbitrarily close to observed data while maintaining computational tractability. As a final topicwe consider the use of an auxiliary temperature variable in MCMC methods to improve exploration of multimodal target densities and allow estimation of normalising constants. Existing approaches such as simulated tempering and annealed importance sampling use temperature variables which take on only a discrete set of values. The performance of these methods can be sensitive to the number and spacing of the temperature values used, and the discrete nature of the temperature variable prevents the use of gradient-based methods such as HMC to update the temperature alongside the target variables. We introduce new MCMC methods which instead use a continuous temperature variable. This both removes the need to tune the choice of discrete temperature values and allows the temperature variable to be updated jointly with the target variables within a HMC method.
  • 2013/07 Insect olfactory landmark navigation

    Matthew M. Graham

    MSc by Research dissertation, University of Edinburgh

    The natural world is full of chemical signals - organisms of all scales and taxonomic classifications transmit and receive chemical signals to guide the full gamut of life’s processes: from helping forming mother-infant bonds, to identifying potential mates and even signalling their own deaths. Insects are particularly reliant on chemical cues to guide their behaviour and understanding how insects respond to and use chemical cues in their environment is a high active research area. In a series of recent studies Steck et al. produced evidence that foragers of the Saharan desert ant species Cataglyphis fortis are able to learn an association between an array of odour sources arranged around the entrance to their nest and the relative location of the nest entrance and later use the information they receive from the odour sources to help them navigate to the visually inconspicious nest entrance. This ability to use odour sources as olfactory landmarks had not been previously seen experimentally in insects, and is a remarkable behaviour given the extremely complex and highly dynamic nature of the olfactory signals received by the ants from the turbulent odour plumes the chemicals travel in from the sources. After an introductory chapter covering some relevant background theory to the work in this project, the second chapter of this dissertation will detail a field study conducted with the European desert ant species Cataglyphis velox. As in the studies of Steck et al. the ants were constrained to moving a linear channel and so the navigation task limited to being one-dimensional, the aim of this study was to see if there was any evidence supporting the hypothesis that Cataglyphis velox ants are able to use olfactory landmarks to navigate in a more realistic open environment. The results of the study were inconclusive, due to the low sample sizes that were collected and small effect size in the study design used, however it is proposed that the study could be considered usefully as pilot for a full study at a later date, and an adjusted study design is proposed that might overcome a lot of the issues encountered in the current study. In the third and final chapter of this dissertation, a modelling study of what information is available in the olfactory signal received from a turbulent odour plume about the location of the source of that plume is presented, with this work aiming to explore the information which may being used by Cataglyphis desert ants when using olfactory landmarks to navigate. The details of the plume and olfactory sensor models used are described and the results of an analysis of the estimated mutual information between the modelled olfactory signals and the location of odour source presented. It is found that the locational informational content of individual signal segment statistics seems to be low, though combining multiple statistics does potentially allow more useful reductions in uncertainty.
  • 2012/06 Measuring tissue stiffness with ultrasound

    Matthew M. Graham

    MEng project report, University of Cambridge

Talks