Training and Serving LLMs on Kubernetes: A Beginner's Guide

Overview

Learn how to effectively deploy and manage Large Language Models (LLMs) using Kubernetes in this 41-minute conference talk presented at All Things Open 2024 by Google Cloud's Abdel Sghiouar. Gain a beginner-friendly introduction to core Kubernetes concepts including pods, containers, deployments, and services essential for LLM deployment. Explore the unique computational resource requirements of LLMs and discover how Kubernetes can help manage them efficiently. Master practical techniques for setting up training pipelines, handling data distribution, and optimizing models within a Kubernetes environment. Examine strategies for deploying LLMs as services, implementing load balancing, and scaling to handle real-world traffic demands. Perfect for developers and engineers looking to streamline their LLM workflows using Kubernetes infrastructure.