Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

The Parquet Format and Performance Optimization Opportunities

Databricks via YouTube

Overview

Dive into the intricacies of the Parquet format and explore performance optimization opportunities in this 41-minute conference talk by Boudewijn Braams from Databricks. Begin with an introduction to structured data formats and physical data storage models, including row-wise, columnar, and hybrid approaches. Delve deeper into the specifics of the Parquet format, examining its disk representation, physical data organization, and encoding schemes. Learn about various performance optimization techniques such as dictionary encoding, page compression, predicate pushdown, dictionary filtering, and partitioning schemes. Discover strategies to combat the issue of 'many small files' and gain insights into the open-source Delta Lake format in relation to Parquet. Suitable for both newcomers seeking an approachable refresher on columnar storage and experienced professionals looking to optimize analytical workloads in Spark, this talk provides tangible tips and tricks to leverage the Parquet format for improved performance.

Syllabus

Intro
Data processing and analytics
Overview
Data sources and formats
Physical storage layout models
Different workloads
Row-wise vs Columnar
Parquet: data organization Data organization
Parquet: encoding schemes
Optimization: dictionary encoding
Optimization: predicate pushdown
Optimization: partitioning • Embed predicates in directory structure
Optimization: avoid many small files
Optimization: avoid few huge files
Optimization: Delta Lake • Open-source storage layer on top of Parquet in Spark
Conclusion

Taught by

Databricks

Reviews

Start your review of The Parquet Format and Performance Optimization Opportunities

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.