Site Reliability Engineering: Measuring and Managing Reliability
Google via Google Cloud Skills Boost
-
31
-
- Write review
Overview
Service level indicators (SLIs) and service level objectives (SLOs) are fundamental tools for measuring and managing reliability. In this course, students learn approaches for devising appropriate SLIs and SLOs and managing reliability through the use of an error budget.
Syllabus
- Introduction
- Course structure
- What's the Difference between DevOps and SRE - Intro
- What's the difference between DevOps and SRE?
- Now SRE, Everyone Else with CRE - Intro
- Now SRE, Everyone Else with CRE
- CRE's Three Reliability Principles
- Reliability in the Cloud
- How SLOs Help Your Business Make Decisions
- How SLOs Help You Build Features Faster
- How SLOs Help You Balance Operational and Project Work
- Making SLOs Work for your Organization
- Targeting Reliability
- Module Introduction
- SLOs vs SLAs
- The Happiness Test
- SLOs and SLAs
- How Do We Measure Reliability?
- Edge Cases
- 100% is the Wrong Target
- Iterating
- Reliability and iterating
- Operating for Reliability
- Module Introduction
- Error Budgets
- Everything is a Trade-Off
- Error Budgets: Advanced Concepts
- Error budgets
- Axes of Improvement
- Operational Approach to Increasing Reliability
- Increasing reliability
- Module Summary
- Choosing a Good SLI
- Module Introduction
- User Happiness in Metric Form
- Measuring happiness
- The Properties of Good SLI Metrics
- Ways of Measuring SLIs
- The SLI Menu
- The SLI Equation
- Request / Response SLIs
- Commonly used SLIs
- Data Processing SLIs
- Correctness and Coverage
- But My System is Really Complex!
- Managing Complexity with Aggregation
- Managing Complexity with Bucketing
- Achievable SLOs
- Aspirational SLOs
- Continuous Improvement
- Developing SLOs and SLIs
- Module Introduction
- The 4-Step Process
- Our Example Game
- Loading the Profile Page
- Refining SLI Specifications
- Postmortem!
- Looking for Observability Gaps
- Failure Modes
- Setting Achievable SLO targets
- Quantifying Risks to SLOs
- Module Introduction
- Is Your Error Budget Realistic?
- Modeling Risks in our Spreadsheet
- Analyzing Risk
- Risk Analysis Sample: Blank Copy
- Risk Analysis Sample: Tribal Thunder
- Consequences of SLO Misses
- Module Introduction
- No Surprises
- A Dashboard Example
- Why an Error Budget Policy?
- Error budget policies
- Fundamentals of an Error Budget Policy
- How to Draft an Error Budget Policy
- Error budget policy - considerations
- Example Policy Thresholds
- Hypothetical Policy Scenario
- Course Conclusion and Video Wrap Up
- Consequences of SLO Misses
- Squirrels
- Additional suggested reading
- Your Next Steps
- Course Badge
Tags
Reviews
4.0 rating, based on 1 Class Central review
Showing Class Central Sort
-
nice course site reliability engineering Service level indicators (SLIs) and service level objectives (SLOs) are fundamental tools for measuring and managing reliability. In this course, students learn approaches for devising appropriate SLIs and SLOs and managing reliability through the use of an error budget.