Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the Flash Attention algorithm with guest speaker Dan Fu, Stanford University researcher and co-author of the groundbreaking paper. Delve into this novel attention mechanism that significantly reduces the computational cost of self-attention in transformer-based models for natural language processing. Learn about the motivation behind Flash Attention, its downstream applications in histopathology, and its impact on memory footprint reduction. Examine empirical validations, benchmarks, and other applications such as long document classification and the Path X benchmark. Gain insights into hardware-efficient long convolutions, state space representation, and the interplay between hardware and algorithms in this comprehensive 57-minute video from Unify.
Syllabus
Introduction
Flash Attention
Motivation for Flash Attention
Downstream Applications
Histopathology
Outline
Attention
Memory Footprint
GPU Memory
Memory Footprint Reduction
Approximate Attention
FlashAttention
Sparsity Fraction
Empirical Validation
Benchmarks
Other Applications
Long Document Classification
Path X Benchmark
Hungry Hungry Hippos
Simple Hardware Efficient Long Convolutions
Summary
Question
State Space Representation
Loop Order
Speed vs Sequence Length
Hardware vs Algorithms
Hardware Software Codesign
Tensor Cores
Taught by
Unify