Explore a groundbreaking framework for automatically tuning cluster manager parameters in large-scale cloud environments. Dive into SelfTune, a solution that leverages reinforcement learning to optimize configuration settings, improving throughput and performance. Learn how this innovative approach has been successfully deployed on tens of thousands of machines at Microsoft, resulting in significant improvements in background task scheduling. Discover the application of SelfTune across various scenarios, including Azure FaaS workloads, Kubernetes Vertical Pod Autoscaler, and the DeathStar microservice benchmark. Gain insights into how SelfTune addresses the challenges of manually setting parameters in dynamic, large-scale environments, and understand its potential to revolutionize cluster management in cloud computing.
Overview
Syllabus
NSDI '23 - SelfTune: Tuning Cluster Managers
Taught by
USENIX