Gradient Descent on Infinitely Wide Neural Networks - Global Convergence and

Overview

Explore gradient descent on infinitely wide neural networks in this 45-minute lecture by Francis Bach at the International Mathematical Union. Delve into the mathematical guarantees for convex optimization problems in linear parameter prediction models and the challenges of non-convex optimization in neural networks. Examine two-layer neural networks with homogeneous activation functions as the number of hidden neurons approaches infinity, and discover how qualitative convergence guarantees can be derived. Learn about machine learning contexts, parametric supervised learning, theoretical deep learning analysis, and optimization techniques for multi-layer neural networks. Investigate Wasserstein gradient flow, global convergence, and the transition from optimization to statistics. Compare kernel and feature learning regimes, and gain insights into the interpolation regime and logistic regression for two-layer neural networks.

Syllabus

Intro
Machine learning Scientific context
Parametric supervised machine learning
Convex optimization problems
Theoretical analysis of deep learning
Optimization for multi-layer neural networks
Gradient descent for a single hidden layer
Wasserstein gradient flow
Many particle limit and global convergence (Chizat and Bach, 2018)
From optimization to statistics
Interpolation regime
Logistic regression for two-layer neural networks
From RKHS norm to variation norm
Kernel regime
Optimizing over two layers
Comparison of kernel and feature learning regimes
Discussion
Conclusion