Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Secret Lives of SREs - Controlling the Costs of Coordination across Remote Teams

USENIX via YouTube

Overview

Explore the intricacies of incident response and coordination in remote SRE teams through this 48-minute conference talk from SREcon20 Americas. Delve into Dr. Laura Maguire's three-year research on engineering teams handling service outages, examining 62 cases across four organizations. Discover surprising findings that challenge existing domain models, including how incident management differs from GoogleSRE suggestions and how incident command can hinder fast resolution. Learn about the subtle choreography of cognitive work in fault management, the potential drawbacks of coordination tools, and strategies for adaptive choreography. Gain insights into how tooling and intra-organizational dependencies affect coordination costs across time and organizational boundaries, increasing complexity for SREs. Understand the challenges of coordinating multiple perspectives, dealing with backup issues, and managing hidden complexities in distributed computing environments.

Syllabus

Introduction
The Secret Lives of SREs
Coordinate Multiple Diverse Perspectives
Backup Issues
Hidden Complexity
Outlier Event
Sarah
Sarahs Knowledge
Incident Response
Incident Command
Speed Bumps
Distributed Computing
Conclusion

Taught by

USENIX

Reviews

Start your review of The Secret Lives of SREs - Controlling the Costs of Coordination across Remote Teams

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.