8-bit Methods for Efficient Deep Learning

Overview

Explore 8-bit methods for efficient deep learning in this comprehensive talk by Tim Dettmers from the University of Washington. Delve into the challenges of training and inferencing large language models, and discover how 8-bit techniques can improve efficiency without compromising performance. Learn about quantization precision, its impact on model performance and training stability, and how these factors change with scale. Examine 8-bit optimizers for training and Int8 inference for large language models up to 175B parameters. Gain insights into various topics, including quantization basics, floating point data types, dynamic exponent quantization, block-wise quantization, and mixed precision decomposition. Understand the challenges of quantizing outliers with large values and explore emerging features in model training. Investigate bit-level scaling laws, instruction tuning with 4-bit adapters, and the concept of 4-bit Normal Float (NF4). This talk offers valuable knowledge for researchers and practitioners looking to make large models more accessible and efficient.

Syllabus

Intro
How does quantization work?
Quantization as a mapping
Quantization Example: A non-standard 2-bit data ty
Floating point data types (FP8)
Dynamic exponent quantization
Motivation: Optimizers take up a lot of memory!
What do outliers in quantization look like?
Block-wise quantization
Putting it together: 8-bit optimizers
Using OPT-175B on a single machine via 8-bit weig
The problem with quantizing outliers with large valu
Emergent features: sudden vs. smooth emergence
Mixed precision decomposition
Bit-level scaling laws experimental setup overview
What does help to improve scaling? Data types
Nested Quantization
Instruction Tuning with 4-bit + Adapters
4-bit Normal Float (NF4)

Taught by

Center for Language & Speech Processing(CLSP), JHU

Reviews

Start your review of 8-bit Methods for Efficient Deep Learning

Taught by

10 Best Deep Learning Courses for 2024

Never Stop Learning.