Explore advanced web scraping techniques in this 42-minute EuroPython Conference talk. Learn how to create a simple, evolving client-server architecture combining ZeroMQ, Selenium, and BeautifulSoup to extract data from dynamic, JavaScript-driven websites like Sporcle and Khan Academy. Discover methods for implementing regular "downloads" without cluttering your desktop or headless server, and how to perform scraping anonymously. Gain insights into overcoming challenges posed by variable content and complex login processes, and understand how this setup can significantly reduce debugging time. Focus on writing robust code that withstands website design changes, enabling efficient data extraction from even the most complex web environments.
Overview
Syllabus
Anthon van der Neut - Beyond scraping
Taught by
EuroPython Conference