Overview
Syllabus
Small Big Data
Prelude: the most important question
TIME FOR A BIG DATA CLUSTER!!!!
A non-solution: don't use RAM, just disk
The software solution: use less RAM
Compression: Numpy dtypes
Compression: sparse arrays
Compression: Pandas dtypes When loading data you can specify types
Chunking: loading Numpy chunks with Zarr
Chunking: with Pandas
Indexing: the simplest solution
Indexing: Pandas without indexing
Indexing: populate SQLite from Pandas
Indexing: load from SQLite into DataFrame
Indexing: SQLite vs. CSV
Conclusion: what about other libraries?
Conclusion: don't forget about
Taught by
PyCon US