Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Flash Attention Explained - Algorithm, Applications, and Performance

Unify via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the Flash Attention algorithm with guest speaker Dan Fu, Stanford University researcher and co-author of the groundbreaking paper. Delve into this novel attention mechanism that significantly reduces the computational cost of self-attention in transformer-based models for natural language processing. Learn about the motivation behind Flash Attention, its downstream applications in histopathology, and its impact on memory footprint reduction. Examine empirical validations, benchmarks, and other applications such as long document classification and the Path X benchmark. Gain insights into hardware-efficient long convolutions, state space representation, and the interplay between hardware and algorithms in this comprehensive 57-minute video from Unify.

Syllabus

Introduction
Flash Attention
Motivation for Flash Attention
Downstream Applications
Histopathology
Outline
Attention
Memory Footprint
GPU Memory
Memory Footprint Reduction
Approximate Attention
FlashAttention
Sparsity Fraction
Empirical Validation
Benchmarks
Other Applications
Long Document Classification
Path X Benchmark
Hungry Hungry Hippos
Simple Hardware Efficient Long Convolutions
Summary
Question
State Space Representation
Loop Order
Speed vs Sequence Length
Hardware vs Algorithms
Hardware Software Codesign
Tensor Cores

Taught by

Unify

Reviews

Start your review of Flash Attention Explained - Algorithm, Applications, and Performance

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.