Overview
Explore a conference talk on INT8 training for tiny on-device learning, presented at USENIX ATC '21. Dive into the innovative Octo system, which employs 8-bit fixed-point quantization in both forward and backward passes of deep models. Learn about the challenges of on-device learning and how the proposed Loss-aware Compensation (LAC) and Parameterized Range Clipping (PRC) techniques optimize computation while preserving training quality. Discover how Octo achieves higher training efficiency, processing speedup, and memory reduction compared to full-precision training and state-of-the-art quantization methods. Gain insights into the system's performance on commercial AI chips and its potential impact on edge intelligence.
Syllabus
Intro
Rise of On-device Learning
Common Compression Methods
The Workflow of DNN Training
Bridge the Gap: Data Quantization
Why We Need Quantization?
Potential Gains
Co-design of Network and Training Engine
Our System: Octo
Loss-aware Compensation
Backward Quantization
Evaluation Setup
Convergence Results
Ablation Study: Impact of LAC and PRC
Image Processing Throughput
Deep Insight of Feature Distribution Visualization of intermediate Feature Distribution
System Overhead
Conclusion
Taught by
USENIX