Accelerating Serverless AI Large Model Inference with Functionalized Scheduling and RDMA

Overview

Explore a conference talk on accelerating serverless AI large model inference through functionalized scheduling and RDMA technology. Dive into the challenges of deploying AI large models on standard serverless inference platforms like KServe, including scheduling inefficiencies and communication bottlenecks. Learn about a highly elastic functionalized scheduling framework developed to achieve second-level scheduling for thousands of serverless AI large model inference task instances. Discover how RDMA technology is leveraged to enable high-speed KV cache migration, overcoming the limitations of traditional network protocol stacks. Gain insights into improving resource utilization, reducing costs, and meeting low-latency and high-throughput demands in AI large model inference deployments.

Syllabus

Accelerating Serverless AI Large Model Inference with Functionalized... - Yiming Li & Chenglong Wang

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Accelerating Serverless AI Large Model Inference with Functionalized Scheduling and RDMA

3000+ Courses from California Community Colleges

Most common

Popular subjects

Popular courses

Accelerating Serverless AI Large Model Inference with Functionalized Scheduling and RDMA

Overview

Syllabus

Taught by

Reviews

3000+ Courses from California Community Colleges

Taught by

Improving GPU Utilization and Accelerating Model Training with Kubernetes Scheduling Framework and NRI

Large Language Models with Azure

Rust for Large Language Model Operations (LLMOps)

Effortless Scalability - Orchestrating Large Language Model Inference with Kubernetes

Effortless Scalability: Orchestrating Large Language Model Inference with Kubernetes

Serverless Machine Learning Inference with KFServing

9 Best Kubernetes Courses for 2024

Never Stop Learning.