Recent Parquet Improvements in Apache Spark - Vectorized Complex Types and Column Index Support

Recent Parquet Improvements in Apache Spark - Vectorized Complex Types and Column Index Support

Databricks via YouTube Direct link

Intro

1 of 19

1 of 19

Intro

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

Recent Parquet Improvements in Apache Spark - Vectorized Complex Types and Column Index Support

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Short Intro
  3. 3 Outline
  4. 4 Introduction on Apache Parquet
  5. 5 Parquet: Glossary
  6. 6 Parquet: Data Page
  7. 7 Background
  8. 8 Non-Vectorized Parquet Reader
  9. 9 Advantages of Vectorized Approach
  10. 10 High Level Idea
  11. 11 Parquet Schema Conversion
  12. 12 SPARK-34863: Complex type support
  13. 13 Complex Type - Performance
  14. 14 Perf: vectorized vs non-vectorized
  15. 15 Parquet Predicate Pushdown
  16. 16 Column Index Filtering
  17. 17 Column Index Support in Spark
  18. 18 Column Index - Performance
  19. 19 Future Work

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.