Explore the world of Site Reliability Engineering (SRE) in this 21-minute conference talk from Devoxx. Dive into key concepts like SLAs, SLIs, and SLOs, and understand their importance in maintaining reliable APIs. Learn why embracing failure is necessary and why 100% availability is a myth. Discover how to define appropriate reliability levels, determine the right number of 9s for uptime, and identify crucial metrics to monitor. Gain insights into error budgets and their practical application. Examine how to prioritize reliability as a feature against other development priorities. Delve into the current trends in site reliability engineering and its implementation. Finally, learn from real-world examples of improving reliability in Conversational AI services at Swisscom, a major customer entry point.
Overview
Syllabus
SRE, what is it all about?! by Luca Simone
Taught by
Devoxx