This course dives into the process, strategies, and best practices of web scraping. Learn how to use the Python framework, Scrapy, to practice key techniques.
Overview
Syllabus
Introduction
- How to learn to stop worrying and love the bot
- What you should know
- What is web scraping?
- How the internet works: A brief summary
- Hello world with Scrapy
- Challenge: Scraping all data on a page
- Solution: Scraping all data on a page
- Crawling a website
- Recording data
- Scrapy settings file
- Structuring your scrapers for extensibility/reusability
- Challenge: Scraping news sites
- Solution: Scraping news sites
- Submitting a form
- Finding and using hidden APIs
- Sitemaps and robots.txt
- Challenge: Using CNN's sitemap
- Solution: Using CNN's sitemap
- Logging in
- Browser automation with Selenium
- Interacting with a page
- Next steps
Taught by
Ryan Mitchell