Completed
- Subsampling Keys vs Negative Sampling
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention
Automatically move to the next video in the Classroom when playback concludes
- 1 - Intro & Overview
- 2 - The Quadratic Memory Bottleneck in Self-Attention
- 3 - The Softmax Operation in Attention
- 4 - Nyström-Approximation
- 5 - Getting Around the Softmax Problem
- 6 - Intuition for Landmark Method
- 7 - Full Algorithm
- 8 - Theoretical Guarantees
- 9 - Avoiding the Large Attention Matrix
- 10 - Subsampling Keys vs Negative Sampling
- 11 - Experimental Results
- 12 - Conclusion & Comments