Linear Transformers Are Secretly Fast Weight Memory Systems - Machine Learning Paper Explained

Linear Transformers Are Secretly Fast Weight Memory Systems - Machine Learning Paper Explained

Yannic Kilcher via YouTube Direct link

- Distributed Storage of Symbolic Values

3 of 13

3 of 13

- Distributed Storage of Symbolic Values

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Linear Transformers Are Secretly Fast Weight Memory Systems - Machine Learning Paper Explained

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro & Overview
  2. 2 - Fast Weight Systems
  3. 3 - Distributed Storage of Symbolic Values
  4. 4 - Autoregressive Attention Mechanisms
  5. 5 - Connecting Fast Weights to Attention Mechanism
  6. 6 - Softmax as a Kernel Method Performer
  7. 7 - Linear Attention as Fast Weights
  8. 8 - Capacity Limitations of Linear Attention
  9. 9 - Synthetic Data Experimental Setup
  10. 10 - Improving the Update Rule
  11. 11 - Deterministic Parameter-Free Projection DPFP Kernel
  12. 12 - Experimental Results
  13. 13 - Conclusion & Comments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.