Unlocking LLM Performance with eBPF - Optimizing Training and Inference Pipelines

Overview

Explore how to optimize Large Language Model (LLM) performance using eBPF in this 38-minute conference talk from the Cloud Native Computing Foundation (CNCF). Discover techniques for achieving observability in LLM training and inference processes without disruption, including Memory Profiling for model and training data loading performance, Network Profiling for data exchange performance, and GPU Profiling for analyzing Model FLOPs Utilization (MFU) and performance bottlenecks. Learn about the practical effects of implementing eBPF-based observability in PyTorch LLM applications and the llm.c project to enhance training and inference performance. Gain insights into overcoming the challenges of improving GPU utilization in LLM processes that handle vast amounts of data and consume significant computational resources.

Syllabus

Unlocking LLM Performance with EBPF: Optimizing Training and Inference Pipelines - Yang Xiang

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Unlocking LLM Performance with eBPF - Optimizing Training and Inference Pipelines

Taught by

Optimizing LLM Performance in Kubernetes with OpenTelemetry

Optimizing LLM Efficiency One Trace at a Time on Kubernetes

Checkpoint Offloading SSD - Enhancing Performance and Scalability in LLM Training

Distributed Caching for Generative AI: Optimizing LLM Data Pipeline on the Cloud

Zero-Instrumentation Observability Based on eBPF - Conf42 SRE 2024

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes

Never Stop Learning.