Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn about advanced techniques for optimizing Large Language Models (LLMs) through weight and key-value cache quantization in this technical lecture presented by Tianyi Zhang. Explore methods for making LLMs both faster and more cost-effective while maintaining performance, with detailed insights into quantization techniques that reduce memory requirements and computational overhead. Dive into practical approaches for implementing these optimizations, understanding their impact on model efficiency, and discovering how to balance speed and resource usage in LLM deployments.
Syllabus
Guest Lecture by Tianyi Zhang: Faster & Cheaper LLMs with Weight and Key-value Cache Quantization
Taught by
UofU Data Science