On the Foundations of Deep Learning - SGD, Overparametrization, and Generalization

On the Foundations of Deep Learning - SGD, Overparametrization, and Generalization

Simons Institute via YouTube Direct link

What's Special About Gradient Descent?

19 of 27

19 of 27

What's Special About Gradient Descent?

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

On the Foundations of Deep Learning - SGD, Overparametrization, and Generalization

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Fundamental Questions
  3. 3 Challenges
  4. 4 What if the Landscape is Bad?
  5. 5 Gradient Descent Finds Global Minima
  6. 6 Idea: Study Dynamics of the Prediction
  7. 7 Local Geometry
  8. 8 Local vs Global Geometry
  9. 9 What about Generalization Error?
  10. 10 Does Overparametrization Hurt Generalization?
  11. 11 Background on Margin Theory
  12. 12 Max Margin via Logistic Loss
  13. 13 Intuition
  14. 14 Overparametrization Improves the Margin
  15. 15 Optimization with Regularizer
  16. 16 Comparison to NTK
  17. 17 Is Regularization Needed?
  18. 18 Warmup: Logistic Regression
  19. 19 What's Special About Gradient Descent?
  20. 20 Changing the Geometry: Steepest Descent
  21. 21 Steepest Descent: Examples
  22. 22 Beyond Linear Models: Deep Networks
  23. 23 Implicit Regularization: NTK vs Asymptotic
  24. 24 Does Architecture Matter?
  25. 25 Example: Changing the Depth in Linear Network
  26. 26 Example: Depth in Linear Convolutional Network
  27. 27 Random Thoughts

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.