Completed
Memory bottleneck, IO awareness
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Flash Attention 2.0 with Tri Dao - Discord Server Talks
Automatically move to the next video in the Classroom when playback concludes
- 1 Main talk starts - intro & motivation
- 2 Behind the scenes: how Tri got started with Flash Attention
- 3 Motivation: modelling long sequences
- 4 Brief recap of attention
- 5 Memory bottleneck, IO awareness
- 6 Flash Attention 2.0 improvements
- 7 Behind the scenes of Flash Attention 2.0 refactor of CUTLASS 3
- 8 Future directions
- 9 Q&A