Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Small Big Data - Using NumPy and Pandas When Your Data Doesn't Fit in Memory

PyCon US via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn techniques for handling datasets too large for memory but too small for Big Data clusters in this 26-minute PyCon US talk. Discover how to process Small Big Data efficiently using NumPy and Pandas through money-saving strategies, compression techniques, batching methods, and indexing approaches. Explore practical solutions like utilizing Numpy dtypes, sparse arrays, and Pandas dtypes for compression, implementing chunking with Zarr and Pandas, and leveraging SQLite for indexing. Gain insights applicable to other libraries and specific data scenarios, empowering you to tackle data processing challenges effectively.

Syllabus

Small Big Data
Prelude: the most important question
TIME FOR A BIG DATA CLUSTER!!!!
A non-solution: don't use RAM, just disk
The software solution: use less RAM
Compression: Numpy dtypes
Compression: sparse arrays
Compression: Pandas dtypes When loading data you can specify types
Chunking: loading Numpy chunks with Zarr
Chunking: with Pandas
Indexing: the simplest solution
Indexing: Pandas without indexing
Indexing: populate SQLite from Pandas
Indexing: load from SQLite into DataFrame
Indexing: SQLite vs. CSV
Conclusion: what about other libraries?
Conclusion: don't forget about

Taught by

PyCon US

Reviews

Start your review of Small Big Data - Using NumPy and Pandas When Your Data Doesn't Fit in Memory

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.