Overview
Explore the concept of reliability engineering in this 39-minute conference talk from SREcon21. Delve into the limitations of current best practices and the "Goldilocks approach" to reliability. Examine a concrete model for framing reliability and its implications for answering complex questions about services. Investigate why certain mitigation strategies are effective and how aggregation and backend drains contribute to reliability. Learn how to identify underlying mechanisms to reinforce desired reliability properties and develop new mitigation strategies. Discover the concept of reliability modeled as stationarity and its practical applications in hierarchical diagnostics and exposing reliability phenomena. Gain insights into the future capabilities of reliability engineering and draw conclusions for improving current practices.
Syllabus
Intro
Acknowledgements
Our Reliability Approach
Goldilocks Reliability
Load Bearing Assumptions
Practical Porridge Problems
The Trouble with Thresholds
Mo' Porridge Mo' Problems
Make More Models!
Model Elephants
Reliability, modeled as Stationarity
Stationarity Works!
Hierarchical Diagnostics
Stationarity Exposes Reliability Phenomena
Tantalizing Capabilities
Conclusions
Taught by
USENIX