Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Confessions of a Systems Engineer - Learning from My 20+ Years of Failure

USENIX via YouTube

Overview

Explore a 39-minute conference talk from SREcon20 Americas where David Argent, an Amazon systems engineer, shares invaluable lessons learned from over two decades of failures in running large-scale online services. Gain insights into best practices for designing and operating complex systems, including minimizing change impact, implementing thorough monitoring, automating mitigations, and designing for quick incident resolution. Learn about the importance of regular process exercises, enforcing procedures with technology, and carefully transitioning service responsibilities. Discover practical advice on creating degraded service modes, utilizing functional gates during releases, and aggressively managing traffic during incidents. Benefit from Argent's experience-based wisdom on producing quality tools, input sanitization, and understanding all supported scenarios to enhance your systems engineering skills and avoid costly mistakes.

Syllabus

Intro
There Are No Safe Changes
Minimize the Blast Radius on Changes
Monitor Accurately and Measure Thoroughly
Automate Mitigations
Degraded Service Modes, or An Imperfect Experience Usually Beats a Nonexistent One
Use Functional Gates Pre-, Post- and During Releases
Design to Meet SLAs and Mitigate Incidents Quickly
Regularly Exercise All Processes and Tools
Enforce Processes with Technology
Redirect or Drop Traffic Aggressively During Incidents
Production Quality Tools
Sanitize and Verify Inputs
Understand All of the Scenarios You Support
Transition Service Responsibilities Carefully

Taught by

USENIX

Reviews

Start your review of Confessions of a Systems Engineer - Learning from My 20+ Years of Failure

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.