Overview
This course explores the phenomenon of grokking in neural networks on small algorithmic datasets, where networks transition from random chance generalization to perfect generalization suddenly. The course delves into the emergence of underlying binary operations in learned latent spaces. The syllabus covers topics such as the grokking phenomenon, double descent, influence factors on grokking, smoothness, simplicity in explanations, and the role of weight decay. The intended audience for this course includes individuals interested in deep learning, neural networks, and generalization in overparametrized models.
Syllabus
- Intro & Overview
- The Grokking Phenomenon
- Related: Double Descent
- Binary Operations Datasets
- What quantities influence grokking?
- Learned Emerging Structure
- The role of smoothness
- Simple explanations win
- Why does weight decay encourage simplicity?
- Appendix
- Conclusion & Comments
Taught by
Yannic Kilcher