Overview
Explore a conference talk from USENIX ATC '23 that delves into AWARE, an innovative framework for automating workload autoscaling using reinforcement learning in production cloud systems. Learn about the challenges of setting optimal resource limits and dynamically scaling workloads at runtime. Discover how AWARE leverages meta-learning and bootstrapping to quickly adapt to different workloads and provide safe, robust RL exploration. Understand the framework's use of an OpenAI Gym-like RL interface for easy integration with various systems tasks. Examine experimental results showcasing AWARE's ability to adapt autoscaling policies 5.5x faster than existing transfer-learning approaches, maintain stable online policy-serving performance, and significantly improve CPU and memory utilization while reducing SLO violations during policy training.
Syllabus
USENIX ATC '23 - AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production...
Taught by
USENIX