Overview
Syllabus
Intro
Background
Postmodern Database
Automation
User escalation
Initial investigation
Restoring service objects
Collecting service definitions
The impact of the incident
The reason for the failure
Fixing the webhooks
Why the operator went rogue
Kubernetes label selector package
Test engineer accidentally created app load balancer
What can we learn
Paradoxical Finalizer
Paging Storm
Mitigation
Kubernetes Platform
Manual Operations
Lessons Learned
User Complaints
Monitoring Dashboard
Victim Cluster
Security Context Change
Learnings
Recap
Key takeaways
Taught by
USENIX