Dive into the world of neural network quantization in this comprehensive lecture from MIT's TinyML and Efficient Deep Learning Computing course. Explore numeric data types in modern computing systems and gain insights into K-means-based quantization and linear quantization techniques. Learn how to optimize deep learning models for resource-constrained devices, enabling powerful AI applications on mobile and IoT platforms. Discover strategies for efficient inference, including model compression, pruning, and neural architecture search. Gain hands-on experience implementing deep learning applications on microcontrollers, mobile phones, and quantum machines through an open-ended design project focused on mobile AI.
Overview
Syllabus
Lecture 05 - Quantization (Part I) | MIT 6.S965
Taught by
MIT HAN Lab