Completed
Intro
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Identifying Hidden Dependencies
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Big data is operationally complex.
- 3 Observability is evolving quickly.
- 4 Two dozen engineers build Honeycomb.
- 5 We make systems humane to run
- 6 by ingesting telemetry
- 7 enabling data exploration
- 8 and empowering engineers.
- 9 We deploy with confidence.
- 10 Continuous delivery is an investment.
- 11 Continuity of operations even more so.
- 12 Stable platforms empower innovation.
- 13 but stateful services can be scary.
- 14 We need velocity and reliability.
- 15 Quantify reliability.
- 16 Identify potential areas of risk.
- 17 Design experiments to probe risk.
- 18 Prioritize addressing risks.
- 19 How broken is "too broken"?
- 20 Service Level Objectives define success.
- 21 SLOs are common language.
- 22 Think in terms of events in context.
- 23 HTTP Code 200? Latency 100ms?
- 24 Set a target Service Level Objective.
- 25 Use a window and target percentage.
- 26 We keep SLOs at Honeycomb.
- 27 We store incoming telemetry.
- 28 Alerts usually evaluate every minute.
- 29 Often, queries come back under 10s.
- 30 Error budget: allowed unavailability
- 31 Is it safe to do this risky experiment?
- 32 Data persistence is tricky.
- 33 Experiment using error budgets.
- 34 Infrequent changes.
- 35 Long-running processes.
- 36 Data integrity and consistency.
- 37 Delicate failover dances
- 38 Restart one server & service at a time.
- 39 Bugs are shallow with more eyes.
- 40 Monitor for changes using SLIs.
- 41 Debug with observability.
- 42 Test the telemetry too!
- 43 Verify fixes by repeating.
- 44 Continuously verify to stop regression.
- 45 Save money with flexibility.
- 46 Hypothesize, test, and learn.
- 47 Celebrate successes and failures.
- 48 Be more reliable & scalable.
- 49 Sleep easily at night.
- 50 You can do this too, step by step.
- 51 Read our blog! hny.co/blog