Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Cultivating Production Excellence

NDC Conferences via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a comprehensive conference talk on cultivating production excellence in complex distributed systems. Learn about essential practices for improving production environments, including fostering stakeholder involvement, enhancing observability through collaboration, implementing Service Level Objectives for measurement, and utilizing risk analysis for prioritizing improvements. Discover strategies to evolve your approach to managing increasingly complex systems, address common challenges like noisy alerts and meaningless dashboards, and shift focus towards investing in people, culture, and processes. Gain insights on setting effective Service Level Indicators, debugging novel cases in production, promoting collaborative debugging, addressing hero culture, and quantifying risks for better planning. Understand how these practices can lead to more humane and efficient system management, even allowing for confident Friday deployments.

Syllabus

Intro
Production is increasingly complex.
We're adding complexity all the time.
Our strategies need to evolve.
When we order the alphabet soup...
Noisy alerts. Grumpy engineers.
Walls of meaningless dashboards.
Tools aren't magical.
Invest in people, culture, & process.
Eliminate (unnecessary) complexity.
Our systems are always failing.
We need Service Level Indicators
What threshold buckets events?
HTTP Code 200? Latency 100ms?
Set a target Service Level Objective.
Use a window and target percentage.
Data-driven business decisions.
Failure modes can't be predicted.
Support debugging novel cases. In production.
Allow forming & testing hypotheses.
Can you explain the variance?
Observability isn't just the data.
Debugging is not a solo activity.
Debugging is for everyone.
Collaboration is interpersonal.
Lean on your team.
Fix hero culture. Share knowledge.
Use the same platforms & tools.
Reward curiosity and teamwork.
Risk analysis helps us plan.
Quantify risks by frequency & impact.
And prioritize completing the work.
Don't waste time chrome polishing.
Lack of observability is systemic risk.
So is lack of collaboration.
A dozen engineers build Honeycomb.
We make systems humane to run
Yes, we deploy on Fridays.

Taught by

NDC Conferences

Reviews

Start your review of Cultivating Production Excellence

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.