Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Meta Developers via YouTube Direct link

Logical partitioning, done at application level

22 of 51

22 of 51

Logical partitioning, done at application level

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Storage Systems at a Rapidly Scaling Startup - Instagram's Infrastructure Evolution

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Approach to data scaling problems
  3. 3 2 total engineers
  4. 4 First bottleneck: disk IO on old Amazon EBS
  5. 5 Django DB Routers
  6. 6 PG Replication to bootstrap nodes
  7. 7 Scaling up Redis
  8. 8 fork() and COW
  9. 9 Vertical partitioning by data type
  10. 10 No easy migration story; mostly double-writing
  11. 11 Replicating + deleting often leaves fragmentation
  12. 12 Why not Redis for kv caching?
  13. 13 Slab allocator
  14. 14 Focus on client
  15. 15 Testing & monitoring kept concurrent fires to a minimum
  16. 16 Scaling Out
  17. 17 Database Scale Out
  18. 18 Double write, shadow reads
  19. 19 Stressing about Primary Key
  20. 20 Data loss, segfaults
  21. 21 train + rapidly approaching cliff
  22. 22 Logical partitioning, done at application level
  23. 23 note to self: pick a power of 2 next time
  24. 24 Postgres "schemas"
  25. 25 9.2 upgrade: bucardo to move schema by schema
  26. 26 ID generation
  27. 27 Snowflake, other options
  28. 28 41 bits: time in millis (41 years of IDs) 13 bits: logical shard ID 10 bits: auto-incrementing sequence, modulo 1024.
  29. 29 Lesson learned
  30. 30 minimize moving parts
  31. 31 Ending the year
  32. 32 Launched Android
  33. 33 Stability, FB
  34. 34 Scaling cut-overs, ramp- ups, and development
  35. 35 Dynamic ramp-ups and config
  36. 36 Python Knobs
  37. 37 Decouple deploy from feature rollout
  38. 38 In memory requirement
  39. 39 Simplest thing was breaking
  40. 40 Trimming
  41. 41 C* cluster is 35% of the size of Redis one
  42. 42 Handling deletes
  43. 43 Redis way: LREM
  44. 44 Not so hot for an AP system
  45. 45 2014 project
  46. 46 Spam fighting
  47. 47 Generic features + machine learning
  48. 48 Hadoop + Hive + Presto
  49. 49 2010 vintage infra
  50. 50 #1 impact: recruiting
  51. 51 Wrap up

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.