Solving Real World Data Science Tasks With Python Beautiful Soup - Movie Dataset Creation
Keith Galli via YouTube
Overview
Syllabus
- Video overview
- Check out DataCamp! sponsored
- Setup
Task #1: Scrape the infobox from Toy Story 3 wiki page save in python dictionary
Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
- Robots.txt Are you allowed to scrape a site?
- Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
- Save & Load dataset checkpoint JSON file
Task #3: Clean our data!
- Task #3.1: Strip out all references [1],[2],etc from HTML
- Task #3.2: Split up the long strings
- Task #3.3: Examine errors we are getting
- Task #3.4: Convert “Running time” field to an integer
- Task #3.5: Convert “Budget” & “Box office” fields to floats
- Task #3.6: Convert dates into datetime objects
- Saving our data again using Pickle
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset working with APIs
Task #5: Save final dataset as a JSON file and as a CSV file
Taught by
Keith Galli