Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Cleaning Bad Data in R

via LinkedIn Learning

Overview

Clean up your data in R. Learn how to identify and address data integrity issues such as missing and duplicate data, using R and the tidyverse.

Syllabus

Introduction
  • Data is messy
  • What you need to know
1. Missing Data
  • Types of missing data
  • Missing values
  • Missing rows
  • Aggregations and missing values
2. Duplicated Data
  • Duplicated rows and values
  • Aggregations in the data set
3. Formatting Data
  • Converting dates
  • Unit conversions
  • Numbers stored as text
  • Text improperly converted to numbers
  • Inconsistent spellings
4. Outliers
  • Screening for outliers
  • Handling outliers
  • Outliers use case
  • Outliers in subgroups
  • Detecting illogical values
5. Tidy Data
  • What is tidy data?
  • Variables, observations, and values
  • Common data problems
  • Wide vs. long data sets
  • Making wide data sets long
  • Making long data sets wide
6. Red Flags
  • Suspicious values
  • Suspicious multiples
Conclusion
  • What's next?

Taught by

Mike Chapple

Reviews

4.8 rating at LinkedIn Learning based on 113 ratings

Start your review of Cleaning Bad Data in R

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.