Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Solving Real World Data Science Tasks With Python Beautiful Soup - Movie Dataset Creation

Keith Galli via YouTube

Overview

Learn how to solve real-world data science tasks using Python and Beautiful Soup in this comprehensive tutorial. Scrape Wikipedia pages to create a dataset on Disney movies while covering a wide range of Python and data science topics. Master web scraping with BeautifulSoup, clean data effectively, test code using Pytest, implement pattern matching with regular expressions, work with dates using the datetime library, save and load data with the Pickle library, and access data from APIs using the Requests library. Follow along with hands-on tasks, including scraping movie information, cleaning and processing data, and integrating external movie ratings. By the end of this tutorial, gain practical experience in creating a robust movie dataset from scratch using various Python libraries and data science techniques.

Syllabus

- Video overview
- Check out DataCamp! sponsored
- Setup
Task #1: Scrape the infobox from Toy Story 3 wiki page save in python dictionary
Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
- Robots.txt Are you allowed to scrape a site?
- Task #2: Scrape infobox for all movies in List of Disney Films save as list of dictionaries
- Save & Load dataset checkpoint JSON file
Task #3: Clean our data!
- Task #3.1: Strip out all references [1],[2],etc from HTML
- Task #3.2: Split up the long strings
- Task #3.3: Examine errors we are getting
- Task #3.4: Convert “Running time” field to an integer
- Task #3.5: Convert “Budget” & “Box office” fields to floats
- Task #3.6: Convert dates into datetime objects
- Saving our data again using Pickle
Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset working with APIs
Task #5: Save final dataset as a JSON file and as a CSV file

Taught by

Keith Galli

Reviews

Start your review of Solving Real World Data Science Tasks With Python Beautiful Soup - Movie Dataset Creation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.