Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Faster and Cheaper LLMs with Weight and Key-value Cache Quantization

UofU Data Science via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn about advanced techniques for optimizing Large Language Models (LLMs) through weight and key-value cache quantization in this technical lecture presented by Tianyi Zhang. Explore methods for making LLMs both faster and more cost-effective while maintaining performance, with detailed insights into quantization techniques that reduce memory requirements and computational overhead. Dive into practical approaches for implementing these optimizations, understanding their impact on model efficiency, and discovering how to balance speed and resource usage in LLM deployments.

Syllabus

Guest Lecture by Tianyi Zhang: Faster & Cheaper LLMs with Weight and Key-value Cache Quantization

Taught by

UofU Data Science

Reviews

Start your review of Faster and Cheaper LLMs with Weight and Key-value Cache Quantization

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.