Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

LoRAX - Serving Thousands of Fine-Tuned LLMs on a Single GPU

Linux Foundation via YouTube

Overview

Explore the innovative LoRAX (LoRA eXchange) LLM inference system in this informative conference talk. Learn how LoRAX enables users to pack thousands of fine-tuned "LoRA" adapters into a single GPU, significantly reducing serving costs compared to dedicated deployments per fine-tuned model. Discover the key features of this open-source, commercially free, and production-ready system, including pre-built docker images and Helm charts. Delve into the core concepts that make LoRAX the most cost-effective and efficient solution for serving fine-tuned LLMs in production, such as Dynamic Adapter Loading, Heterogeneous Continuous Batching, and Adapter Exchange Scheduling. Gain insights into how these techniques optimize latency, throughput, and resource utilization while managing multiple concurrent adapters on a single GPU.

Syllabus

LoRAX: Serve 1000s of Fine-Tuned LLMs on a Single GPU - Travis Addair, Predibase, Inc.

Taught by

Linux Foundation

Reviews

Start your review of LoRAX - Serving Thousands of Fine-Tuned LLMs on a Single GPU

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.