Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Speed Up Your Data Processing

EuroPython Conference via YouTube

Overview

Explore techniques to accelerate data processing in this 30-minute EuroPython 2020 conference talk. Learn about common bottlenecks in data science workflows and how to overcome them using parallel and asynchronous programming with Python's concurrent.futures module. Discover the differences between sequential and parallel processing, synchronous and asynchronous execution, and when to apply these concepts in network I/O operations and computation-driven workloads. Gain practical insights into implementing parallelism and asynchronous programming to optimize data processing pipelines, allowing more focus on extracting value from data. Through real-life analogies, understand concepts like Amdahl's Law, multiprocessing vs multithreading, and practical implementations using ThreadPoolExecutor and ProcessPoolExecutor. Suitable for data scientists, engineers, and anyone with basic Python knowledge interested in improving data processing efficiency.

Syllabus

Intro
A typical data science workflow
Data Processing in Python
Challenges with Data Processing
Task 1: Toast 100 slices of bread
Sequential Processing
Parallel Processing
Task 2: Brew coffee
Synchronous Execution
Practical Considerations
Amdahl's Law and Parallelism
Multiprocessing vs Multithreading
Initialize Submission List
Using ThreadPoolExecutor
Initialize Python modules
Initialize image resize process
Initialize File List in Directory
Using List Comprehensions
Using Process PoolExecutor

Taught by

EuroPython Conference

Reviews

Start your review of Speed Up Your Data Processing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.