Deep Learning Meets Nonparametric Regression: Are Weight-decayed DNNs Locally Adaptive?
USC Information Sciences Institute via YouTube
Overview
Syllabus
Intro
From the statistical point of view, the success of DNN is a mystery.
Why do Neural Networks work better?
The "adaptivity" conjecture
NTKs are strictly suboptimal for locally adaptive nonparametric regression
Are DNNs locally adaptive? Can they achieve optimal rates for TV-classes/Besov classes?
Background: Splines are piecewise polynomials
Background: Truncated power basis for splines
Weight decay = Total Variation Regularization
Weight decayed L-Layer PNN is equivalent to Sparse Linear Regression with learned basis functions
Main theorem: Parallel ReLU DNN approaches the minimax rates as it gets deeper.
Comparing to classical nonparametric regression methods
Examples of Functions with Heterogeneous Smoothness
Step 2: Approximation Error Bound
Summary of take-home messages
Taught by
USC Information Sciences Institute