Overview
Explore a 26-minute conference talk from LISA18 that delves into implementing Site Reliability Engineering (SRE) and DevOps principles in startup environments. Learn how Craig Sebenik from Split compares centralized support teams versus distributed teams that include developers. Discover the challenges of applying Google's SRE model to smaller-scale operations, and gain insights into the lessons learned and potential pitfalls when implementing either approach. Understand the differences between pure DevOps and specialist roles, hiring considerations, and the SRE hierarchy of reliability. Cover essential topics such as metrics and monitoring, incident response, release management, and capacity planning in the context of startups with limited resources.
Syllabus
Intro
SRE (and DevOps) at a Startup
My Background
What is Split?
Overview
What is SRE?
DevOps is Not a Job Title
Life at a Startup
Lots of Pieces
Developers Have Product Focus
Specialist (aka SRE)
Pure DevOps vs Specialist
Who To Hire
SRE Hierarchy of Reliability
Metrics and Monitoring
Incident Response
Release
Capacity Planning
Summary • SRE is an implementation of the DevOps paradigm.
Questions?
Taught by
USENIX