Completed
We make systems humane to run
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Cultivating Production Excellence
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Production is increasingly complex.
- 3 We're adding complexity all the time.
- 4 Our strategies need to evolve.
- 5 When we order the alphabet soup...
- 6 Noisy alerts. Grumpy engineers.
- 7 Walls of meaningless dashboards.
- 8 Tools aren't magical.
- 9 Invest in people, culture, & process.
- 10 Eliminate (unnecessary) complexity.
- 11 Our systems are always failing.
- 12 We need Service Level Indicators
- 13 What threshold buckets events?
- 14 HTTP Code 200? Latency 100ms?
- 15 Set a target Service Level Objective.
- 16 Use a window and target percentage.
- 17 Data-driven business decisions.
- 18 Failure modes can't be predicted.
- 19 Support debugging novel cases. In production.
- 20 Allow forming & testing hypotheses.
- 21 Can you explain the variance?
- 22 Observability isn't just the data.
- 23 Debugging is not a solo activity.
- 24 Debugging is for everyone.
- 25 Collaboration is interpersonal.
- 26 Lean on your team.
- 27 Fix hero culture. Share knowledge.
- 28 Use the same platforms & tools.
- 29 Reward curiosity and teamwork.
- 30 Risk analysis helps us plan.
- 31 Quantify risks by frequency & impact.
- 32 And prioritize completing the work.
- 33 Don't waste time chrome polishing.
- 34 Lack of observability is systemic risk.
- 35 So is lack of collaboration.
- 36 A dozen engineers build Honeycomb.
- 37 We make systems humane to run
- 38 Yes, we deploy on Fridays.