Completed
- Improving the Update Rule
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Linear Transformers Are Secretly Fast Weight Memory Systems - Machine Learning Paper Explained
Automatically move to the next video in the Classroom when playback concludes
- 1 - Intro & Overview
- 2 - Fast Weight Systems
- 3 - Distributed Storage of Symbolic Values
- 4 - Autoregressive Attention Mechanisms
- 5 - Connecting Fast Weights to Attention Mechanism
- 6 - Softmax as a Kernel Method Performer
- 7 - Linear Attention as Fast Weights
- 8 - Capacity Limitations of Linear Attention
- 9 - Synthetic Data Experimental Setup
- 10 - Improving the Update Rule
- 11 - Deterministic Parameter-Free Projection DPFP Kernel
- 12 - Experimental Results
- 13 - Conclusion & Comments