Flash Attention Explained - Algorithm, Applications, and Performance

Flash Attention Explained - Algorithm, Applications, and Performance

Unify via YouTube Direct link

Memory Footprint Reduction

10 of 28

10 of 28

Memory Footprint Reduction

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Flash Attention Explained - Algorithm, Applications, and Performance

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Introduction
  2. 2 Flash Attention
  3. 3 Motivation for Flash Attention
  4. 4 Downstream Applications
  5. 5 Histopathology
  6. 6 Outline
  7. 7 Attention
  8. 8 Memory Footprint
  9. 9 GPU Memory
  10. 10 Memory Footprint Reduction
  11. 11 Approximate Attention
  12. 12 FlashAttention
  13. 13 Sparsity Fraction
  14. 14 Empirical Validation
  15. 15 Benchmarks
  16. 16 Other Applications
  17. 17 Long Document Classification
  18. 18 Path X Benchmark
  19. 19 Hungry Hungry Hippos
  20. 20 Simple Hardware Efficient Long Convolutions
  21. 21 Summary
  22. 22 Question
  23. 23 State Space Representation
  24. 24 Loop Order
  25. 25 Speed vs Sequence Length
  26. 26 Hardware vs Algorithms
  27. 27 Hardware Software Codesign
  28. 28 Tensor Cores

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.