Overview
Syllabus
Intro
Why you should listen to me
Quick foundation
What makes distributed systems different
A subset of failures
Clients stuck to an overloaded process
Partial failure
"It's slow" is the hardest problem you'll ever debug
Create partial availability
"Who to Follow" in the monorail
Knowing what the system has done
Percentiles, not averages
Tracing
On profiling
Releases should change a metric
Free-form logs are liars
Common "problems" are overlogged
Uncommon problems
Avoid coordination
Backpressure
Dropping new messages on the floor
Returning "overload" error responses
Timeouts and exponential back-offs
Roll out infrastructure with feature flags
if (Decider.available..)
Multiple versions are the norm
Datacenter schedulers are worth it
Collaboration is politics
No time-traveling stalkers
moral necessity
Data minimization is a
Taught by
GOTO Conferences