Clean up your data in R. Learn how to identify and address data integrity issues such as missing and duplicate data, using R and the tidyverse.
Overview
Syllabus
Introduction
- Data is messy
- What you need to know
- Types of missing data
- Missing values
- Missing rows
- Aggregations and missing values
- Duplicated rows and values
- Aggregations in the data set
- Converting dates
- Unit conversions
- Numbers stored as text
- Text improperly converted to numbers
- Inconsistent spellings
- Screening for outliers
- Handling outliers
- Outliers use case
- Outliers in subgroups
- Detecting illogical values
- What is tidy data?
- Variables, observations, and values
- Common data problems
- Wide vs. long data sets
- Making wide data sets long
- Making long data sets wide
- Suspicious values
- Suspicious multiples
- What's next?
Taught by
Mike Chapple