Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

StreamingLLM: Deploying Language Models for Streaming Applications with Long Text Sequences

MIT HAN Lab via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the innovative StreamingLLM technique for deploying language models in streaming applications with long text sequences and limited memory. Discover the "attention sink" phenomenon and learn how it can be leveraged to process infinite text lengths without fine-tuning. Understand the challenges of existing window-based KV cache methods and the suboptimal eviction policies they employ. Gain insights into a novel approach that maintains attention sinks in the KV cache while utilizing a sliding window mechanism for the remaining tokens. Access the implementation code on GitHub to further investigate this cutting-edge solution for efficient language model deployment.

Syllabus

StreamingLLM Lecture

Taught by

MIT HAN Lab

Reviews

Start your review of StreamingLLM: Deploying Language Models for Streaming Applications with Long Text Sequences

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.