Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Meta Developers via YouTube Direct link

First bottleneck: disk IO on old Amazon EBS

4 of 51

4 of 51

First bottleneck: disk IO on old Amazon EBS

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Approach to data scaling problems
  3. 3 2 total engineers
  4. 4 First bottleneck: disk IO on old Amazon EBS
  5. 5 Django DB Routers
  6. 6 PG Replication to bootstrap nodes
  7. 7 Scaling up Redis
  8. 8 fork() and COW
  9. 9 Vertical partitioning by data type
  10. 10 No easy migration story; mostly double-writing
  11. 11 Replicating + deleting often leaves fragmentation
  12. 12 Why not Redis for kv caching?
  13. 13 Slab allocator
  14. 14 Focus on client
  15. 15 Testing & monitoring kept concurrent fires to a minimum
  16. 16 Scaling Out
  17. 17 Database Scale Out
  18. 18 Double write, shadow reads
  19. 19 Stressing about Primary Key
  20. 20 Data loss, segfaults
  21. 21 train + rapidly approaching cliff
  22. 22 Logical partitioning, done at application level
  23. 23 note to self: pick a power of 2 next time
  24. 24 Postgres "schemas"
  25. 25 9.2 upgrade: bucardo to move schema by schema
  26. 26 ID generation
  27. 27 Snowflake, other options
  28. 28 41 bits: time in millis (41 years of IDs) 13 bits: logical shard ID 10 bits: auto-incrementing sequence, modulo 1024.
  29. 29 Lesson learned
  30. 30 minimize moving parts
  31. 31 Ending the year
  32. 32 Launched Android
  33. 33 Stability, FB
  34. 34 Scaling cut-overs, ramp- ups, and development
  35. 35 Dynamic ramp-ups and config
  36. 36 Python Knobs
  37. 37 Decouple deploy from feature rollout
  38. 38 In memory requirement
  39. 39 Simplest thing was breaking
  40. 40 Trimming
  41. 41 C* cluster is 35% of the size of Redis one
  42. 42 Handling deletes
  43. 43 Redis way: LREM
  44. 44 Not so hot for an AP system
  45. 45 2014 project
  46. 46 Spam fighting
  47. 47 Generic features + machine learning
  48. 48 Hadoop + Hive + Presto
  49. 49 2010 vintage infra
  50. 50 #1 impact: recruiting
  51. 51 Wrap up

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.