Overview
Explore tools for achieving high performance Python in this 40-minute conference talk from ODSC Europe 2019. Discover recent changes in the Python ecosystem that enable fast identification of slow code, simple compilation of CPU-bound numpy processing with Numba, efficient Pandas operations, and parallelized medium-data operations with Dask. Learn new techniques and processes to optimize algorithms, improve data pipelines, and maximize the use of complex tools like Pandas. Gain insights from Ian Ozsvald, Chief Data Scientist and co-organizer of PyDataLondon, as he shares practical examples and discusses the forthcoming 2nd edition of "High Performance Python". Acquire valuable knowledge to enhance your Python programming skills and boost the efficiency of your data science projects.
Syllabus
Introductions
Today's goal
A typical higher-performance task Calculate features including slope
A typical task - need slope of the line
Pandas iterrows
Pandas apply with raw=True
Swifter
Dask with Numba
Costs on the "big problem"
On being "highly performant"
Summary
Taught by
Open Data Science