Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Resource-Aware Scheduling for Production GenAI with RAG on Multicluster Cloud Kubernetes

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Explore a comprehensive approach to resource-aware scheduling for production GenAI with Retrieval-Augmented Generation (RAG) in a multicluster cloud Kubernetes environment. Dive into the advantages of self-hosting GenAI models, including improved control, privacy, performance, and cost-effectiveness. Learn how Kubernetes cloud resource management provides a flexible hosting platform for these systems. Discover the proposed architecture utilizing multiple Kubernetes clusters and a resource-aware policy-based cluster scheduler. Examine the key components of this setup, including vector databases for RAG contexts, load-balanced query services, prediction services for model execution, and ingestion services for vector database updates. Understand the benefits of using a cloud-native multi-region scalable vector database and running services across different Kubernetes clusters. Gain insights into the geographical distribution of CPU and GPU clusters for optimal reliability, latency, and resource availability. Explore the role of the cluster scheduler in placement and scaling decisions. Analyze the benefits of this approach and learn about a reference implementation to help you apply these concepts in your own GenAI projects.

Syllabus

Resource-Aware Scheduling for Production GenAI with RAG running on Multicluster Cloud Kubernetes

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Resource-Aware Scheduling for Production GenAI with RAG on Multicluster Cloud Kubernetes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.