Overview
Explore the development and application of a Mesos simulator for optimizing batch scheduling frameworks in this insightful conference talk. Delve into the process of building a simulator for Cook, an open-source batch scheduling Mesos framework used at Two Sigma, and discover how it enables testing algorithm changes without running the entire distributed system. Learn about the challenges faced during simulator construction, the valuable insights gained from running historical job traces, and the importance of simulation testing for production systems. Gain knowledge on defining experiments, making system changes, and conducting various tests such as upgrading and doubling request rates. Examine the Cook and Mesos architectures, understand the decision-making process in building a simulator, and explore the implementation of mock Mesos and trigger-able Cook internals. Investigate time handling in simulations and witness the practical application of simulation in changing preemption settings, with a focus on fairness results in both simulated and production environments.
Syllabus
Agenda
Example System
Define: Experiment (noun)
Being scientific
What can we change?
your system
Experiment: Test upgrade
Experiment: Double request rate
Hypothesis: Server failure
Examples
Cook at Two Sigma
Cook Architecture
Mesos Architecture
Building a Simulator: Choices
Building a Simulator: Decisions
Cook Simulator
Mock Mesos
Trigger-able Cook Internals
Simulation driver cycle
How to handle time?
Applied Simulation: Changing Pre-emption
Preemption knobs
Results in simulation: Fairness
Results in production
Taught by
Linux Foundation