Overview
Explore a conference talk from SREcon19 Americas that delves into Google's innovative approach to sublinear scaling in Site Reliability Engineering (SRE). Learn how one team dramatically increased their service portfolio by over 200% without additional staffing, aiming for a 1000-service goal. Discover the extensive automation infrastructure implemented, including automated incident handling and policy verification. Gain insights into the cultural shift from service-specific expertise to service-agnostic consulting, and understand the long-term vision for SRE in large organizations. Examine topics such as imperative and declarative automation, automatic continuous production readiness reviews, and the team's incremental progress towards achieving sublinear scaling.
Syllabus
Intro
About the Team
About Project Work
About Sublinear Scaling
Imperative Automation
Declarative Automation
Sequencer
Automations
Incremental progress
Automatic continuous production readiness reviews
Automated incident handling
Summary
Taught by
USENIX