The Parquet Format and Performance Optimization Opportunities

The Parquet Format and Performance Optimization Opportunities

Databricks via YouTube Direct link

Physical storage layout models

5 of 16

5 of 16

Physical storage layout models

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

The Parquet Format and Performance Optimization Opportunities

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Data processing and analytics
  3. 3 Overview
  4. 4 Data sources and formats
  5. 5 Physical storage layout models
  6. 6 Different workloads
  7. 7 Row-wise vs Columnar
  8. 8 Parquet: data organization Data organization
  9. 9 Parquet: encoding schemes
  10. 10 Optimization: dictionary encoding
  11. 11 Optimization: predicate pushdown
  12. 12 Optimization: partitioning • Embed predicates in directory structure
  13. 13 Optimization: avoid many small files
  14. 14 Optimization: avoid few huge files
  15. 15 Optimization: Delta Lake • Open-source storage layer on top of Parquet in Spark
  16. 16 Conclusion

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.