Completed
Data sources and formats
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
The Parquet Format and Performance Optimization Opportunities
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Data processing and analytics
- 3 Overview
- 4 Data sources and formats
- 5 Physical storage layout models
- 6 Different workloads
- 7 Row-wise vs Columnar
- 8 Parquet: data organization Data organization
- 9 Parquet: encoding schemes
- 10 Optimization: dictionary encoding
- 11 Optimization: predicate pushdown
- 12 Optimization: partitioning • Embed predicates in directory structure
- 13 Optimization: avoid many small files
- 14 Optimization: avoid few huge files
- 15 Optimization: Delta Lake • Open-source storage layer on top of Parquet in Spark
- 16 Conclusion