Overview
Syllabus
Intro
ZALANDO AT A GLANCE
2019: DEVELOPERS USING KUBERNETES
INGRESS ERRORS
COREDNS OOMKILL
STOP THE BLEEDING: INCREASE MEMORY LIMIT
INCREASE IN MEMORY USAGE
CONTRIBUTING FACTORS
CUSTOMER IMPACT
IAM RETURNING 404
NUMBER OF PODS
ROUTES FROM API SERVER
API SERVER DOWN
INNOCENT MANIFEST
INCIDENT #2: LESSONS LEARNED
CLUSTER DOWN?
THE TRIGGER
CLUSTER LIFECYCLE MANAGER (CLM)
CLUSTER CHANNELS
FLANNEL ERRORS
RBAC CHANGES
NETWORK SPLIT
CREDENTIALS QUEUE
WHAT HAPPENED
SLACK
DISABLING CPU THROTTLING
RACE CONDITIONS..
COMMON PITFALLS
READINESS & LIVENESS PROBES
RESOURCE REQUESTS & LIMITS
AWS EKS IN PRODUCTION
AUTOMATED E2E TESTS
MONITORING
OPENTRACING
UPGRADE TO KUBERNETES 1.14
EMERGENCY ACCESS SERVICE
KUBERNETES FAILURE STORIES
INTERNAL TICKETS BASED ON FAILURE STORIES
FACTFULNESS
WHY KUBERNETES?
COMPLEXITY FOR GOOGLE-SCALE INFRA?
OPEN SOURCE & MORE
Taught by
GOTO Conferences