Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Twine - A Unified Cluster Management System for Shared Infrastructure

USENIX via YouTube

Overview

Explore a comprehensive presentation on Twine, Facebook's innovative cluster management system designed for shared infrastructure. Delve into the system's unique approach to managing one million machines across multiple data centers in a geographic region through a single control plane. Learn about the TaskControl API that enables application-specific customization, and discover how host profiles are utilized to optimize hardware and OS settings for diverse workloads. Understand the rationale behind Facebook's decision to deploy power-efficient small machines universally and leverage autoscaling for improved utilization. Gain insights into the challenges and solutions involved in migrating workloads to shared infrastructure, and examine the lessons learned from implementing this large-scale system. Compare Twine's approach to conventional practices and explore its impact on performance, efficiency, and resource management in data centers.

Syllabus

Intro
Data center geographic regions
What design decisions did Twine make differently?
What if we used Kubernetes?
How does Twine avoid stranded capacity?
How does Twine perform fleet-wide optimization?
How does Twine perform fleet-wide optimization fo. entire geographic region?
How well does the Twine scheduler scale?
How do we mitigate risks with 1M machines per deployment?
Private pools or shared infrastructure?
What is host customization?
What is the overhead for host profile switches?
What drives host profile changes?
What are the challenges with supporting ubiquitous shared infrastructure?
Challenge: Tasks are not homogenous
How does Twine collaborate with applications?
What is our shared infrastructure adoption?
How easy is it to migrate onto shared infrastructure.
Power is our most constrained resource
Big machines or small machines?
Why use small machines?
How much do we save by using small machines?
What lessons did we learn using small machines?
Conclusion

Taught by

USENIX

Reviews

Start your review of Twine - A Unified Cluster Management System for Shared Infrastructure

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.