Overview
Explore a groundbreaking approach to generating high-performance tensor programs for deep learning in this 20-minute conference talk from OSDI '20. Dive into Ansor, a novel framework that revolutionizes the optimization of tensor programs across various hardware platforms. Learn how Ansor's hierarchical search space representation, evolutionary search, and learned cost model outperform existing strategies, leading to significant performance improvements for deep neural networks on Intel CPUs, ARM CPUs, and NVIDIA GPUs. Discover the challenges in current deep learning systems and how Ansor's innovative task scheduler simultaneously optimizes multiple subgraphs. Follow the presentation's structure, covering topics such as the deep learning system stack, compiler approaches, program sampling techniques, and ablation studies, to gain a comprehensive understanding of this cutting-edge technology that pushes the boundaries of efficient deep learning execution.
Syllabus
Intro
Deep Learning System Stack
Introducing Compiler
TVM's Approach
Halide's Auto-scheduler
Challenges and our approach
Program Sampling
Sketch Generation Examples 1/2
Random Annotation Examples
Evolutionary Search
Learned Cost Model
Task Scheduler
Single Operator
Subgraph
Network
Ablation Study
Summary
Taught by
USENIX