Overview
Explore a comprehensive comparison of Pandas 2, Dask, and Polars for efficiently handling large datasets on a single machine in this informative 28-minute conference talk. Delve into the latest advancements in data processing tools, including Pandas 2's new Arrow data types, faster calculations, and improved scalability. Learn about Dask's ability to scale Pandas across cores and its recent "expressions" optimization. Discover Polars, a new competitor designed around Arrow with native multicore support. Gain insights into solving a "just about fits in RAM" data task using these three solutions, understanding their pros and cons to make informed decisions for research workflows. Examine whether Pandas operations still require 5x working RAM, the speed improvements in Pandas string operations, and the compatibility of Polars with tools like Scikit-learn and matplotlib. Presented by Ian Ozsvald, an experienced Chief Data Scientist and author, this talk offers valuable knowledge for data scientists and researchers looking to optimize their data processing techniques.
Syllabus
Pandas 2, Dask or Polars? Quickly Tackling Larger Data on a Single Machine by Ian Ozsvald
Taught by
GAIA