Migrating Pinterest Apache Spark Clusters from HDFS to S3

Migrating Pinterest Apache Spark Clusters from HDFS to S3

Databricks via YouTube Direct link

Key Takeaways

9 of 24

9 of 24

Key Takeaways

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Migrating Pinterest Apache Spark Clusters from HDFS to S3

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Agenda
  3. 3 Big Data Platform
  4. 4 Old vs New cluster
  5. 5 Old Cluster: Performance Bottleneck
  6. 6 A Simple Aggregation Query
  7. 7 9k Mappers * 9k Reducers
  8. 8 New Cluster: Choose the right EC2 instance
  9. 9 Key Takeaways
  10. 10 Read after write consistency
  11. 11 How often does this happen
  12. 12 Solution. Considerations
  13. 13 Our Approach
  14. 14 Performance Comparison: S3 vs HDFS
  15. 15 Dealing with Metadata Operation
  16. 16 Reduce Move Operations
  17. 17 Multipart Upload API
  18. 18 The Last Move Operation
  19. 19 Fix Bucket Rate Limit Issue (503)
  20. 20 Improving S3Committer
  21. 21 S3 Benefit Compare to HDFS
  22. 22 Things We Miss in Mesos
  23. 23 Cost Saving
  24. 24 Spark at Pinterest

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.