Explore Site Reliability Engineering (SRE) approaches for large-scale companies in this 41-minute conference talk from NDC Porto 2022. Discover how to build effective tools for reliability engineering, covering key decision-making processes such as buy vs. build vs. open-source and local vs. global maxima. Learn about creating a unified platform with common tools to enhance the overall experience. Understand the importance of feedback loops between tools, teams, and events like incident reviews and GameDays. Dive into two specific projects: an automated region failover capability and chaos engineering. Gain insights into designs, architectures, technical challenges, developer experience considerations, adoption hurdles, and valuable lessons learned in scaling reliability engineering practices.
Overview
Syllabus
Scaling Reliability Engineering with Tools - Nikos Katirtzis & Daniel Albuquerque - NDC Porto 2022
Taught by
NDC Conferences