Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Prepping Data for Analysis Using R

Open Data Science via YouTube

Overview

Explore data preparation techniques for analysis using R in this comprehensive conference talk from ODSC WEST 2015. Learn the fundamentals of data quality and how to automate routine steps in a principled manner. Discover common pitfalls in data preparation and how to detect and fix them through interactive demonstrations in the open-source R analysis environment. Download materials from the provided GitHub repository to follow along or practice later. Gain insights on handling faulty sensor situations, missing variables, novel categorical levels, and compact coding. Understand the importance of treatment plans, user interfaces, and operational issues in data preparation. Led by John Mount and Nina Zumel, experienced data scientists and authors, this talk covers essential topics such as linear regression, calibration, interpretation, and avoiding overfitting. Equip yourself with practical skills to improve your data science projects and increase their chances of success.

Syllabus

Intro
Workshop Outline
Workshop Agenda
Workshop Goals
Data Preparation
Faulty Sensor Situation
systematically missing variables
building missing variables
missing values
pragmatic solution
novel categorical levels
new data
Wyoming
Chemical categorical variables
Dealing with new levels
VTreat solution
Categorical variables
Compact coding
Indicator vs numerical variables
Treatment Plan
User Interface
Treatment Example
Linear Regression
Calibration
Interpretation
Operational Issues
Overfitting
Data fussing
John Mount

Taught by

Open Data Science

Reviews

Start your review of Prepping Data for Analysis Using R

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.