Optimizing LLM Efficiency One Trace at a Time on Kubernetes
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Learn how to optimize Large Language Model (LLM) deployments on Kubernetes through a 25-minute conference talk from CNCF experts. Discover techniques for using OpenTelemetry's profiling capabilities to identify resource-intensive code segments, detect memory leaks, and prevent out-of-memory errors in LLM applications. Master the art of dynamic runtime inspection to improve model performance, reduce latency, and meet service level agreements. Gain practical insights into achieving efficient Kubernetes deployments while optimizing resource utilization and controlling costs. Explore methods for deep-level code analysis that enable precise identification of performance bottlenecks and resource drains in LLM implementations.
Syllabus
Optimizing LLM Efficiency One Trace at a Time on Kubernetes - Aditya Soni, Forrester & Seema Saharan
Taught by
CNCF [Cloud Native Computing Foundation]