Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an innovative approach to deep learning model sparsity in this 16-minute conference talk from OSDI '22. Learn about Tensor-with-Sparsity-Attribute (TeSA), a new abstraction that augments the default Tensor abstraction for dense models. Discover how TeSA enables sparsity attributes and patterns to be specified, propagated, and utilized across entire deep learning models, resulting in highly efficient, specialized operators. Understand the SparTA framework's ability to accommodate various sparsity patterns and optimization techniques, delivering significant speedups in inference latency compared to state-of-the-art solutions. Gain insights into the evolution of sparsity patterns, obstacles in sparsity optimization, and the importance of end-to-end model sparsity approaches. Examine the framework's architecture, execution transformation, and code specialization techniques, as well as its performance across various patterns and models.
Syllabus
Intro
Computation Capacity vs DNN Model Size
Sparsity Commonly Exists
Evolving of Sparsity Pattern
Obstacles of Sparsity Optimization
The Myth of Proxy Metrics
Across-Stack Innovations in Silos
SparTA: An End-to-End Approach to Model Sparsity
Core Abstraction: TeSA
System Architecture
Execution Transformation
Code Specialization
What SparTA Achieves
Evaluation on Various Patterns & Models
End-to-end Opportunity
Mixed Sparsity Evaluation
Real Latency for Algorithm
Conclusion
Taught by
USENIX