Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous Resources

Overview

Learn how ByteDance optimizes LLM workload performance through enhanced topology-aware scheduling in this technical conference talk. Explore solutions for managing high-density processors, including die-level affinity implementation and anti-affinity configuration between memory bandwidth-intensive pods. Discover techniques for achieving inter-RDMA affinity at ToR level to prevent switch congestion, implementing GPU-RDMA affinity at PCIe switch level for accelerated communication via GPUDirect RDMA, and establishing job-level topology affinity within Kubernetes scheduler's pod-level operations. Gain insights into addressing K8s topology management limitations for new-generation processors and shifting performance bottlenecks from computation to networking, with practical approaches for handling heterogeneous resources like GPU and RDMA.

Syllabus

Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous... He Cao

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous Resources

Taught by

Improving GPU Utilization and Accelerating Model Training with Kubernetes Scheduling Framework and NRI

Leverage Topology Modeling and Topology-Aware Scheduling to Accelerate LLM Training

ML Training Acceleration with Heterogeneous Resources in ByteDance

Trimaran: Load-Aware Scheduling for Power Efficiency and Performance Stability

Load-Aware GPU Fractioning for LLM Inference on Kubernetes

Training Large Language Models on Kubernetes

9 Best Kubernetes Courses for 2024

Never Stop Learning.