Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Exploring the Latency, Throughput, and Cost Space for LLM Inference

MLOps.community via YouTube

Overview

Explore the intricacies of LLM inference stacks in this 30-minute conference talk by Timothée Lacroix, CTO of Mistral. Delve into the process of selecting the optimal model for specific tasks, choosing appropriate hardware, and implementing efficient inference code. Examine popular inference stacks and setups, uncovering the factors that contribute to inference costs. Gain insights into leveraging current open-source models effectively and learn about the limitations in existing open-source serving stacks. Discover the potential advancements that future generations of models may bring to the field of LLM inference.

Syllabus

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Taught by

MLOps.community

Reviews

Start your review of Exploring the Latency, Throughput, and Cost Space for LLM Inference

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.