Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Meta via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the journey of scaling storage systems at Instagram from 30 million to 150 million active users with a small engineering team. Learn about the evolving architecture, challenges faced, and solutions implemented over three years. Discover insights on database scaling, Redis optimization, logical partitioning, ID generation, and lessons learned in minimizing moving parts. Gain knowledge about dynamic ramp-ups, feature rollouts, and spam fighting techniques. Understand the impact of infrastructure choices on recruitment and the importance of adapting to rapid growth in a startup environment.

Syllabus

Intro
Approach to data scaling problems
2 total engineers
First bottleneck: disk IO on old Amazon EBS
Django DB Routers
PG Replication to bootstrap nodes
Scaling up Redis
fork() and COW
Vertical partitioning by data type
No easy migration story; mostly double-writing
Replicating + deleting often leaves fragmentation
Why not Redis for kv caching?
Slab allocator
Focus on client
Testing & monitoring kept concurrent fires to a minimum
Scaling Out
Database Scale Out
Double write, shadow reads
Stressing about Primary Key
Data loss, segfaults
train + rapidly approaching cliff
Logical partitioning, done at application level
note to self: pick a power of 2 next time
Postgres "schemas"
9.2 upgrade: bucardo to move schema by schema
ID generation
Snowflake, other options
41 bits: time in millis (41 years of IDs) 13 bits: logical shard ID 10 bits: auto-incrementing sequence, modulo 1024.
Lesson learned
minimize moving parts
Ending the year
Launched Android
Stability, FB
Scaling cut-overs, ramp- ups, and development
Dynamic ramp-ups and config
Python Knobs
Decouple deploy from feature rollout
In memory requirement
Simplest thing was breaking
Trimming
C* cluster is 35% of the size of Redis one
Handling deletes
Redis way: LREM
Not so hot for an AP system
2014 project
Spam fighting
Generic features + machine learning
Hadoop + Hive + Presto
2010 vintage infra
#1 impact: recruiting
Wrap up

Taught by

Meta Developers

Reviews

Start your review of Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.