Overview
Syllabus
Intro
Reactive stream processing: Mantis
Container deployment: Titus
What the cluster needs to support - Heterogeneous mix of workload
Why juggle at all?
Scheduling challenge in large clusters
Our initial goals for a cluster scheduler • Multi goal optimization for task placement . Cluster autoscaling • Extensibility
Multi goal task placement
Security
Capacity guarantees
Fenzo scheduling strategy
Fitness functions we use • CPU, memory, and network in packing
Hard constraints we use • GPU server matching
Soft constraints we use • Specified by individual jobs at submittime • Balance tasks of a job across availability zones
Mixing fitness with soft constraints
Our queues setup
Sizing agent clusters for capacity
Reasoning about allocation failures
What's next?
Questions?
Taught by
Linux Foundation