Overview
Discover how Yelp optimizes its Mesos and Marathon infrastructure on AWS Spot Fleet in this 41-minute conference talk. Learn about PaaSTA, Yelp's Platform-as-a-Service, and explore strategies for maximizing cluster utilization while minimizing costs. Gain insights into autoscaling services and servers, improving resilience against AWS instance terminations, and gracefully migrating active traffic. Delve into decision-making processes for cluster capacity management, and understand the challenges and benefits of running on volatile infrastructure. Explore future plans for predictive scaling, parallel scaling, and expanded service deployment. Perfect for those interested in large-scale infrastructure management, cost optimization, and resilient system design in cloud environments.
Syllabus
Introduction
Agenda
Yelps Production Clusters
Yelp Traffic
Service Autoscale
Service Metrics
Decision Policies
Autoscaling Operations
Scaling Down
AWS Spot Fleet Conditions
Why Live With Volatile Capacity
Strategy 1 counterintuitive
Strategy 2 diversifying
Ignoring Hosts
Cost Comparison
Future Plans
Predictive vs Reactive
Parallel Scaling
Deployment to More Services
Conclusion
Hiring
Mitigating risks
Taught by
Linux Foundation