Overview
Explore quantization aware training in this 31-minute tutorial from the Inside TensorFlow series. Learn the fundamentals of quantization aware training, discover the TensorFlow/Keras API used for implementation, and follow along as Software Engineer Pulkit Bhuwalka demonstrates the process. Dive into topics such as optimizing ML models, uniform/linear quantization, recovering lost accuracy, and implementing QAT with Keras. Gain insights on quantizing entire models or subsets, customizing quantized layers, and writing your own quantization algorithms. Understand core Keras abstractions, layer lifecycle, model-layer interactions, and Keras wrappers. Access additional resources through provided documentation and GitHub links to enhance your understanding of quantization aware training in TensorFlow.
Syllabus
1 TensorFlow
The plan for the next hour
Optimizing ML Models
Model Optimization Toolkit
Uniform/Linear Quantization
Quantization is lossy
Quantization Aware Training (QAT)
How to recover lost accuracy?
Accuracy recovered using QAT
QAT and Keras
Quantize entire model
Quantize subset of model
Custom Quantize a layer
Quantize your own layer
Write your own algorithm (Quantizer)
QAT Keras APIs
Core Keras Abstractions
Keras Layer Lifecycle
Keras Model - Layer interaction
Keras Wrapper
Sample Wrapper
MOT Wrappers
Keras Model Transformer
Taught by
TensorFlow