Overview
Learn how Meta architects its development environment for reliability in this technical conference talk that explores the infrastructure powering their developers' daily work. Dive into Meta's software architecture, examining both virtual machine devservers and On Demand containers while understanding the mechanisms that keep these systems updated, reliable and available. Explore key concepts including disaster recovery, production engineering practices, and strategies for maintaining system stability during maintenance workflows and outages. Gain practical insights into user-facing development workflows, disaster preparation, runbook creation, communication protocols, and live migration techniques. Discover valuable lessons learned from past incidents and how they've shaped Meta's approach to building robust development environments at scale.
Syllabus
Introduction
Overview
Developing Code at Meta
Dev Server
OnDemand Containers
Production Engineering
Dev Team
Dev Environment Architecture
Designing for Reliability
UserFacing Development Workflow
Preparing for Disasters
Storms Drains
Runbooks
Communication
Live Migration
Learning from Disasters
Taught by
InfoQ