Beyond Empirical Risk Minimization - The Lessons of Deep Learning

Overview

Explore the paradigm shift in machine learning theory presented by Professor Mikhail Belkin in this thought-provoking lecture. Delve into the apparent contradiction between classical statistical wisdom and modern deep learning practices, where over-parameterized models with near-perfect training data fit show excellent test performance. Examine the challenges this poses to traditional Empirical Risk Minimization concepts and discover the emerging "double descent" risk curve that unifies classical and modern models. Investigate the nature of inductive bias in deep learning, particularly in auto-encoders, and their potential implementation of associative memory. Gain insights into the evolving landscape of generalization theory, optimization techniques, and the role of memorization in neural networks.

Syllabus

Intro
The ERM/SRM theory of learning
Unifom laws of large numbers
Capacity control
U-shaped generalization curve
Does interpolation overfit?
Interpolation does not averfit even for very noisy data
why bounds fail
Interpolation is best practice for deep learning
Historical recognition
The key lesson
Generalization theory for interpolation?
A way forward?
Interpolated k-NN schemes
Interpolation and adversarial examples
Double descent risk curve
More parameters are better: an example
Random Fourier networks
what is the mechanism?
Double Descent in Randon Feature settings
Smoothness by averaging
Framework for modern ML
The landscape of generalization
Optimization: classical
Modern Optimization
From classical statistics to modern ML
The nature of inductive bias
Memorization and interpolation
Interpolation in deep auto-encoders
Neural networks as models for associative memory
Why are attractors surprising?
Memorizing sequences