Gradient Descent on Infinitely Wide Neural Networks - Global Convergence and
International Mathematical Union via YouTube
Overview
Syllabus
Intro
Machine learning Scientific context
Parametric supervised machine learning
Convex optimization problems
Theoretical analysis of deep learning
Optimization for multi-layer neural networks
Gradient descent for a single hidden layer
Wasserstein gradient flow
Many particle limit and global convergence (Chizat and Bach, 2018)
From optimization to statistics
Interpolation regime
Logistic regression for two-layer neural networks
From RKHS norm to variation norm
Kernel regime
Optimizing over two layers
Comparison of kernel and feature learning regimes
Discussion
Conclusion
Taught by
International Mathematical Union