Using Pandas and Dask to Work with Large Columnar Datasets in Apache Parquet

Using Pandas and Dask to Work with Large Columnar Datasets in Apache Parquet

EuroPython Conference via YouTube Direct link

Outline

2 of 21

2 of 21

Outline

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Using Pandas and Dask to Work with Large Columnar Datasets in Apache Parquet

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Outline
  3. 3 Business Model
  4. 4 Data Flow
  5. 5 Conclusion
  6. 6 Why do I care
  7. 7 Other technologies
  8. 8 Blob storage
  9. 9 Data sharing
  10. 10 Pocky
  11. 11 Why Parquet
  12. 12 Python implementations
  13. 13 Parquet file structure
  14. 14 Pre predicate pushdown
  15. 15 Dictionary encoding
  16. 16 Compression
  17. 17 Partitioning
  18. 18 Storage
  19. 19 ODBC
  20. 20 Azure Blob Storage
  21. 21 Questions

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.