Rethinking Attention with Performers

Rethinking Attention with Performers

Yannic Kilcher via YouTube Direct link

- Orthogonal Features are Even Better

9 of 14

9 of 14

- Orthogonal Features are Even Better

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Rethinking Attention with Performers

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - Intro & Outline
  2. 2 - Quadratic Bottleneck in Attention Mechanisms
  3. 3 - Decomposing the Attention Matrix
  4. 4 - Approximating the Softmax Kernel
  5. 5 - Different Choices, Different Kernels
  6. 6 - Why the Naive Approach does not work!
  7. 7 - Better Approximation via Positive Features
  8. 8 - Positive Features are Infinitely Better
  9. 9 - Orthogonal Features are Even Better
  10. 10 - Experiments
  11. 11 - Broader Impact Statement
  12. 12 - Causal Attention via Prefix Sums
  13. 13 - Code
  14. 14 - Final Remarks & Conclusion

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.