LMS Computer Science Colloquium

The LMS Computer Science Colloquium includes themed talks on a topical issue at the interface of mathematics and computer science. It is organised by the LMS Computer Science Committee. The event is aimed at PhD students and post-docs, although others are welcome to attend.

This year the event took place on 17th November 2021 via Zoom, with the theme Mathematical Foundations for Machine Learning. The speakers were:

Benjamin Guedj (UCL and Inria)
Alexandros Hollender (Oxford)
Aretha Teckentrup (Edinburgh) 
Peter Tino (Birmingham)

Computer Science Colloquium 2021: talks

Benjamin Guedj:

On generalisation and learning: a (condensed) primer on PAC-Bayes, followed by News from the PAC-Bayes frontline

View this talk: https://youtu.be/WnOFG-JkKeU.

PAC-Bayes is a generic and flexible framework to address generalisation abilities of machine learning algorithms. It leverages the power of Bayesian inference and allows to derive new learning strategies. I will briefly present the key concepts of PAC-Bayes and highlight a few recent contributions from my group.

References: https://bguedj.github.io/publications/#, and our ICML 2019 tutorial https://bguedj.github.io/icml2019/index.html

Alexandros Hollender:

The Complexity of Gradient Descent: CLS = PPAD ∩ PLS

View this talk: https://youtu.be/DeODkEz0tM8.

We consider the problem of computing a Gradient Descent solution of a continuously differentiable function f on a bounded convex domain, i.e., finding a "point where Gradient Descent terminates". Equivalently, this corresponds to computing a so-called KKT (Karush-Kuhn-Tucker) stationary point of f.

We study this problem in the "white box" model, where the function f is given in the input, e.g., as an arithmetic circuit. By some standard arguments, it follows that the problem lies in the intersection of the complexity classes PPAD and PLS. PPAD is mostly known for capturing the complexity of computing a Nash equilibrium, while PLS has been successful in characterizing the complexity of various local optimization problems, such as Local-Max-Cut.

We prove that the Gradient Descent problem is complete for PPAD ∩ PLS. In particular, this implies that the class CLS ("Continuous Local Search") which was previously thought to be a strict subset of PPAD ∩ PLS, is in fact equal to it.

Aretha Teckentrup:

Numerical analysis of Gaussian process regression

View this talk: https://youtu.be/dCAyJpU7SG8.

We are interested in the task of estimating an unknown function from observed data, given as a set of input/output pairs. In this context, Gaussian process regression is often used as a Bayesian inference procedure. This results in a probability distribution on the unknown function conditioned on the observed data, the so-called posterior distribution.

In this talk, we will survey some recent results on convergence properties of Gaussian process regression, and show how tools from applied mathematics can provide insight into the behaviour of this methodology. A particular focus will be on analysing an empirical Bayes' approach for estimating unknown hyper-parameters, and on studying the intrinsic depth of deep Gaussian processes.

Peter Tino:

From dynamical systems to kernel based feature spaces and back

View this talk: https://youtu.be/Tb8zdzddsDk.

Parametrized state space models in the form of recurrent neural networks are often used in machine learning to learn from data streams exhibiting temporal dependencies. Such models are usually composed of two parts: an input-driven dynamical system and a (static) readout mapping from the state space of the dynamical system to the outputs.

I will talk about a  framework for rigorous analysis of such dynamical systems in a class of recurrent neural networks known as Echo State Networks (ESN). In particular, I will show that there is a deep connection between the state space and the notion of feature space in kernel machines (a well known class of machine learning methods). This viewpoint will lead to several rather surprising results linking the structure of the dynamical coupling in ESN with properties of the associated temporal feature space.