Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Practical Container Scheduling: Optimizations, Guarantees, and Trade-Offs at Netflix - Lecture

Linux Foundation via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of container scheduling in large-scale distributed clusters through this conference talk from Netflix's Senior Software Engineer. Dive deep into the challenges, design, and trade-offs achieved using the open-source scheduling library Fenzo, which takes a holistic approach to provide a nimble scheduling core for various independently evolving clusters. Learn about capacity guarantees, task placement, elasticity, and operational insights for tackling large-scale operations. Discover how Netflix juggles multiple scheduling objectives and constraints, including bin packing, task locality, and capacity guarantees, to efficiently run microservices, batch, and stream processing applications in shared Mesos clusters. Gain valuable insights into multi-goal optimization, cluster autoscaling, and extensibility strategies. Explore fitness functions, hard and soft constraints, and queuing setups used in Netflix's container scheduling process. Understand how to reason about allocation failures and size agent clusters for capacity. This talk provides practical knowledge for engineers working with container scheduling in complex, large-scale environments.

Syllabus

Intro
Reactive stream processing: Mantis
Container deployment: Titus
What the cluster needs to support - Heterogeneous mix of workload
Why juggle at all?
Scheduling challenge in large clusters
Our initial goals for a cluster scheduler • Multi goal optimization for task placement . Cluster autoscaling • Extensibility
Multi goal task placement
Security
Capacity guarantees
Fenzo scheduling strategy
Fitness functions we use • CPU, memory, and network in packing
Hard constraints we use • GPU server matching
Soft constraints we use • Specified by individual jobs at submittime • Balance tasks of a job across availability zones
Mixing fitness with soft constraints
Our queues setup
Sizing agent clusters for capacity
Reasoning about allocation failures
What's next?
Questions?

Taught by

Linux Foundation

Reviews

Start your review of Practical Container Scheduling: Optimizations, Guarantees, and Trade-Offs at Netflix - Lecture

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.