In many IT organizations, incentives are not aligned between developers, who strive for agility, and operators, who focus on stability. Site reliability engineering, or SRE, is how Google aligns incentives between development and operations and does mission-critical production support. Adoption of SRE cultural and technical practices can help improve collaboration between the business and IT. This course introduces key practices of Google SRE and the important role IT and business leaders play in the success of SRE organizational adoption.
Overview
Syllabus
- Welcome to Developing a Google SRE Culture
- Welcome and Getting Started Guide!
- Course introduction
- Module 1 Quiz
- Module 1 Key Points and Reflection Activity
- DevOps, SRE, and Why They Exist
- Module introduction
- DevOps and SRE
- DevOps and SRE Quiz
- Module 2 Exercise
- SLOs with Consequences
- Module introduction
- SRE value
- Postmortems
- Blamelessness and Psych Safety
- SLOs and error budgets
- Share vision and knowledge
- Module 3 Quiz
- Module 3 Exercise
- Make Tomorrow Better than Today
- Module introduction
- Continuous integration, continuous delivery, and canarying
- Design thinking and prototyping
- Toil
- Psychology of change
- Module 4 Quiz
- Module 4 Exercise
- Regulate Workload
- Module introduction
- Toil and reliability
- Goal setting
- Module 5 Quiz
- Module 5 Exercise
- Apply SRE in Your Organization
- Module introduction
- Organizational Maturity
- Skills and Training
- SRE Teams
- Getting Started
- Module 6 Quiz
- Module 6 Exercise
- Final Assessment
- Final Assessment
- Learner Workbook
- Resources
- Course Resources
- Course Resources
- Your Next Steps
- Course Badge