Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore efficient strategies for downloading a billion small files using Python in this EuroPython 2019 conference talk. Dive into three concurrent downloading mechanisms: multithreading, multiprocessing, and asyncio. Learn design best practices, debugging techniques, error handling, and performance comparisons for each approach. Gain insights into network latency, file size considerations, and API interactions. Examine code examples and performance metrics to understand the trade-offs between different methods. Discover how to optimize your workflow, handle pagination, and improve download speeds. Apply lessons learned to choose the most suitable library for large-scale file downloading tasks.
Syllabus
Introduction
The Task
Understanding the Task
Network Latency
File Size
The API
The Get API
Disclaimers
Synchronous
Multithreading
Coding
Main Loop
Performance
Why is this happening
Things to keep in mind
Multiprocessing
Multiprocessing code
Iterating over pages
Downloader
Speed Improvements
Async IO
List Call
Async IO Task
Different Libraries
UV Loop
Setup
IO HTTP
ItAll Files
Download Files
Summary
Multi Processing
Threading
Workflow
Interprocess communication overhead
Pagination token
Combo results
The real summary
Lessons learned
Thank you
Taught by
EuroPython Conference