Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention

Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention

Yannic Kilcher via YouTube Direct link

- The Softmax Operation in Attention

3 of 12

3 of 12

- The Softmax Operation in Attention

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Nyströmformer- A Nyström-Based Algorithm for Approximating Self-Attention

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro & Overview
  2. 2 - The Quadratic Memory Bottleneck in Self-Attention
  3. 3 - The Softmax Operation in Attention
  4. 4 - Nyström-Approximation
  5. 5 - Getting Around the Softmax Problem
  6. 6 - Intuition for Landmark Method
  7. 7 - Full Algorithm
  8. 8 - Theoretical Guarantees
  9. 9 - Avoiding the Large Attention Matrix
  10. 10 - Subsampling Keys vs Negative Sampling
  11. 11 - Experimental Results
  12. 12 - Conclusion & Comments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.