Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

ROLLER - Fast and Efficient Tensor Compilation for Deep Learning

USENIX via YouTube

Overview

Explore a groundbreaking approach to tensor compilation for deep learning in this 15-minute conference talk from OSDI '22. Delve into ROLLER, a novel system that dramatically reduces kernel generation time from hours to seconds while maintaining competitive performance. Learn about the innovative rTile abstraction and recursive construction algorithm that enable efficient execution across various accelerators, including GPUs and IPUs. Discover how ROLLER's white-box solution addresses the challenges of excessive compilation times and large search spaces in existing DNN compilers. Examine the system's performance on different hardware, its ability to handle small and irregular shapes, and its impact on improving pipeline throughput. Gain insights into the future of deep learning development cycles and custom kernel creation for new hardware vendors.

Syllabus

Intro
Excessive Compilation Time
Black-Box Compiler
Motivating Example: 8k^3 matmul
Roller: A White-Box Solution
Improving Pipeline Throughput
Abstracted GPU (V100 Example)
Small & Irregular Shapes
Evaluations - V100 Performance
Evaluation Compilation Time
Summary

Taught by

USENIX

Reviews

Start your review of ROLLER - Fast and Efficient Tensor Compilation for Deep Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.