TensorFlow Model Optimization - Quantization and Pruning

Overview

Explore TensorFlow model optimization techniques in this 41-minute conference talk from TF World '19. Dive into quantization and pruning methods presented by Raziel Alvarez, a TensorFlow performance expert. Learn about the importance of optimization, its benefits in resource-constrained environments, and machine learning efficiency opportunities. Discover various quantization types, tools, and their impact on model accuracy. Examine neural connection pruning, stencil pruning, and tensor pruning techniques. Gain insights into the TensorFlow pruning API and pruning schedules. Understand the roadmap for better target hardware and provide feedback on optimization tools. Conclude with a Q&A session addressing topics such as training with integer constellations.

Syllabus

Introduction
Why is this important
Benefits of optimization
Resource constrained environment
Application constrained environment
Machine learning opportunities
Machine learning efficiency
Matrix multiply
Goals for optimization
Reducing precision
Reducing memory
Reducing bandwidth pressure
Reduce precision
Linear mapping
The problem
The implications
Quantization is complicated
Its hard to interpret
The model is not enough
Quantization types
Quantization benefits
Quantization tools
Posttraining
TensorFlow flowlight converter
Quantisation types
Highbury quantization
Accuracy
Interior Quantization
Results
Quantization training
Quantization model
Hybrid quantization
Integer quantization
Training scrape
Summary
Neural connection pruning
Stencil pruning
Tensor pruning
TensorFlow pruning API
Pruning schedule
Benefits of pruning
Roadmap
Better target hardware
Feedback
Tools
Questions
Training with integer constellations
Question

Taught by

TensorFlow

Reviews

Start your review of TensorFlow Model Optimization - Quantization and Pruning

Taught by

Inside TensorFlow - TF Model Optimization Toolkit - Quantization and Pruning

Inside TensorFlow

Inside TensorFlow - Quantization Aware Training

Structured Quantization for Neural Network Language Model Compression

Inference and Quantization for AI - Session 3

Inside TensorFlow - TensorFlow Lite

Never Stop Learning.