Practicalities of Productionizing Distributed Systems

Overview

Explore practical strategies for deploying distributed systems in production environments in this GOTO Chicago 2018 conference talk. Delve into the challenges of productionizing distributed systems, including handling failures, debugging performance issues, and implementing effective logging and monitoring. Learn about tactics for managing partial availability, tracing, profiling, and release management. Discover approaches to avoid coordination problems, implement backpressure mechanisms, and utilize datacenter schedulers. Gain insights on collaboration, data minimization, and the importance of considering ethical implications in system design. Acquire valuable knowledge from an experienced distributed systems engineer to improve your ability to build and maintain robust, scalable systems in real-world production environments.

Syllabus

Intro
Why you should listen to me
Quick foundation
What makes distributed systems different
A subset of failures
Clients stuck to an overloaded process
Partial failure
"It's slow" is the hardest problem you'll ever debug
Create partial availability
"Who to Follow" in the monorail
Knowing what the system has done
Percentiles, not averages
Tracing
On profiling
Releases should change a metric
Free-form logs are liars
Common "problems" are overlogged
Uncommon problems
Avoid coordination
Backpressure
Dropping new messages on the floor
Returning "overload" error responses
Timeouts and exponential back-offs
Roll out infrastructure with feature flags
if (Decider.available..)
Multiple versions are the norm
Datacenter schedulers are worth it
Collaboration is politics
No time-traveling stalkers
moral necessity
Data minimization is a

Taught by

GOTO Conferences

Reviews

Start your review of Practicalities of Productionizing Distributed Systems

Taught by

Distributed Systems in Production: Tactics and Strategy - Lecture 32

Why Are Distributed Systems so Hard?

Never Stop Learning.