Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Advances in Quantization for Efficient On-Device Neural Network Inference

EDGE AI FOUNDATION via YouTube

Overview

Learn about cutting-edge quantization techniques for efficient on-device AI inference in this 18-minute conference talk from Qualcomm AI Research Staff Engineer Mart van Baalen at tinyML EMEA. Explore the critical comparison between FP8 and INT8 formats, understand the challenges of oscillations in quantization-aware training, and discover solutions for handling outliers in transformers and large language models. Gain practical insights into mixed-precision methods and learn how to optimize deep neural networks for reduced memory usage, compute requirements, and energy consumption. Delve into detailed technical discussions covering the distribution differences between numerical formats, accuracy comparisons, and practical implementation challenges in quantization-aware training, with specific examples using MobileNetV2 architecture. Master the techniques needed to make AI more efficient and deployable on edge devices within strict power and thermal constraints.

Syllabus

Intro
Low-precision numerical formats
INT8 and FP8 have the same number of values but different distributions.
INT8 and FP8 accuracy
Challenges in using integer quantization
Introduction to Quantization-Aware Training (QAT)
Oscillating weights in QAT
MobileNetV2 - comparison to literature
Why do outliers occur?
Outliers in Transformers

Taught by

EDGE AI FOUNDATION

Reviews

Start your review of Advances in Quantization for Efficient On-Device Neural Network Inference

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.