Speed Up Your Data Processing

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore techniques to accelerate data processing in this 30-minute EuroPython 2020 conference talk. Learn about common bottlenecks in data science workflows and how to overcome them using parallel and asynchronous programming with Python's concurrent.futures module. Discover the differences between sequential and parallel processing, synchronous and asynchronous execution, and when to apply these concepts in network I/O operations and computation-driven workloads. Gain practical insights into implementing parallelism and asynchronous programming to optimize data processing pipelines, allowing more focus on extracting value from data. Through real-life analogies, understand concepts like Amdahl's Law, multiprocessing vs multithreading, and practical implementations using ThreadPoolExecutor and ProcessPoolExecutor. Suitable for data scientists, engineers, and anyone with basic Python knowledge interested in improving data processing efficiency.

Syllabus

Intro
A typical data science workflow
Data Processing in Python
Challenges with Data Processing
Task 1: Toast 100 slices of bread
Sequential Processing
Parallel Processing
Task 2: Brew coffee
Synchronous Execution
Practical Considerations
Amdahl's Law and Parallelism
Multiprocessing vs Multithreading
Initialize Submission List
Using ThreadPoolExecutor
Initialize Python modules
Initialize image resize process
Initialize File List in Directory
Using List Comprehensions
Using Process PoolExecutor

Taught by

EuroPython Conference

Reviews

Start your review of Speed Up Your Data Processing

Taught by

Distributed programming on the cloud

Parallel Computing in Python - Current State and Recent Advances

Cloud developer

10 Best Python Courses for 2024: Charming the Snake

Never Stop Learning.