Optimizing LLM Efficiency One Trace at a Time on Kubernetes

Overview

Learn how to optimize Large Language Model (LLM) deployments on Kubernetes through a 25-minute conference talk from CNCF experts. Discover techniques for using OpenTelemetry's profiling capabilities to identify resource-intensive code segments, detect memory leaks, and prevent out-of-memory errors in LLM applications. Master the art of dynamic runtime inspection to improve model performance, reduce latency, and meet service level agreements. Gain practical insights into achieving efficient Kubernetes deployments while optimizing resource utilization and controlling costs. Explore methods for deep-level code analysis that enable precise identification of performance bottlenecks and resource drains in LLM implementations.

Syllabus

Optimizing LLM Efficiency One Trace at a Time on Kubernetes - Aditya Soni, Forrester & Seema Saharan

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Optimizing LLM Efficiency One Trace at a Time on Kubernetes

Taught by

Optimizing LLM Performance in Kubernetes with OpenTelemetry

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes

Unlocking LLM Performance with eBPF - Optimizing Training and Inference Pipelines

Load-Aware GPU Fractioning for LLM Inference on Kubernetes

Strategies for Efficient LLM Deployments in Any Cluster

Leverage Topology Modeling and Topology-Aware Scheduling to Accelerate LLM Training

9 Best Kubernetes Courses for 2024

Never Stop Learning.