Overview
Explore server fleet management using Camunda in this CamundaCon 2019 conference talk. Discover how LinkedIn handles hardware failures and maintains capacity across multiple data centers hosting thousands of servers. Learn about the implementation of hands-off capacity management through various workflows, utilizing Camunda and other components. Gain insights into seamless integrations with LinkedIn's infrastructure platforms and best practices for achieving optimal results. Delve into topics such as technology scale, Espresso hardware, impact of hardware failures, automation requirements, solution design, metrics for success, and challenges with high-performance management. Understand how LinkedIn addresses the complexities of server maintenance and capacity management in the face of multiple failures, ensuring the reliability of their distributed data store, Espresso.
Syllabus
Intro
Outline
Introduction
Technology Scale
Espresso Hardware
Impact of hardware failure
Failure frequency
Challenges with HW recovery
Requirements for automation
Features needed
Solution
Entry barriers
Automate part 2
Initial design 1
Design 2 (Final design)
Metrics
Success
Part 3 in production
Challenges with HPM
Our deployment
Future
Taught by
Camunda