The Apache Spark File Format Ecosystem - Optimizing Storage for Performance

The Apache Spark File Format Ecosystem - Optimizing Storage for Performance

Databricks via YouTube Direct link

Intro

1 of 22

1 of 22

Intro

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

The Apache Spark File Format Ecosystem - Optimizing Storage for Performance

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Session Goals
  3. 3 File Formats
  4. 4 Row-wise Storage
  5. 5 Columnar (Column-wise) Storage
  6. 6 Hybrid Storage
  7. 7 Example Data
  8. 8 About: CSV
  9. 9 About: JSON
  10. 10 About: Avro
  11. 11 Inspecting: Avro
  12. 12 About: ORC
  13. 13 Structure: ORC
  14. 14 Inspecting: ORC
  15. 15 Config: ORC
  16. 16 Structure: Parquet
  17. 17 Inspecting: Parquet (1)
  18. 18 Inspecting: Parquet (2)
  19. 19 Config: Parquet
  20. 20 Case Study: Veraset
  21. 21 Looking Forward: Apache Arrow
  22. 22 Final Thoughts

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.