Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Efficiently Serving Large Language Models - Optimizing Performance and Resource Management

Centre for Networked Intelligence, IISc via YouTube

Overview

Explore the challenges and solutions in efficiently serving Large Language Models (LLMs) in this technical talk by Microsoft Research India's Senior Researcher Dr. Ashish Panwar. Gain insights into why LLM deployment requires multiple GPUs per replica despite low resource utilization, and discover cutting-edge research from Microsoft addressing these efficiency challenges. Learn about innovative solutions like Sarathi-Serve [OSDI'24] and vAttention [ASPLOS'25], which tackle fundamental scheduling and memory management issues in LLM serving systems. Understand the current landscape of LLM deployment across applications such as chatbots, search, and code assistants, while diving into the technical complexities of making these systems more resource-efficient and cost-effective.

Syllabus

Time: 5:00 PM - PM IST

Taught by

Centre for Networked Intelligence, IISc

Reviews

Start your review of Efficiently Serving Large Language Models - Optimizing Performance and Resource Management

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.