Explore a 16-minute conference talk from OSDI '23 that introduces Cilantro, an innovative system for performance-aware resource allocation in cluster environments. Learn how Cilantro utilizes online learning mechanisms to form feedback loops with jobs, estimating resource-to-performance mappings and load shifts without requiring manual job profiling. Discover how this approach enables the achievement of various user-specified scheduling objectives while handling model uncertainty through adaptive policies. Examine Cilantro's effectiveness in two scenarios: a multi-tenant 1000 CPU cluster with 20 independent jobs, where it outperforms nine baselines across three performance-aware scheduling objectives, and a microservices setting distributing 160 CPUs among 19 interdependent microservices, resulting in significant improvements in end-to-end P99 latency compared to existing baselines.
Overview
Syllabus
OSDI '23 - Cilantro: Performance-Aware Resource Allocation for General Objectives via Online...
Taught by
USENIX