Foundations for Feature Learning via Gradient Descent

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the foundations of feature learning through gradient descent in this 49-minute lecture from the USC Probability and Statistics Seminar. Delve into the mystery of how models like deep neural networks can extract useful features and learn high-quality representations directly from data while simultaneously fitting labels. Examine the recent success of transformer architectures, self-supervised learning, and transfer learning. Discover why existing theoretical results often fall short in explaining feature learning in practical scenarios. Investigate the spectral bias phenomena in gradient descent that leads to globally optimal and well-generalizing solutions. Learn how this approach combines high-dimensional probability/statistics, optimization, and nonlinear control to analyze model generalization. Gain insights into the implications for transfer learning, self-attention, prompt-tuning via transformers, and simple self-supervised learning settings.