SGD and Weight Decay Secretly Compress Your Neural Network

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the intriguing concept of how Stochastic Gradient Descent (SGD) and weight decay techniques inadvertently compress neural networks in this insightful 55-minute conference talk by Tomer Galanti from MIT. Delve into the underlying mechanisms that contribute to this hidden compression effect, gaining a deeper understanding of how these widely-used optimization methods impact the efficiency and performance of deep learning models.