What you'll learn:
- Convert raw and dirty data into clean data
- Understand how clean data looks and how to achieve it
- Use the R Tidyverse packages to clean data
- Handle missing values in R
- Detect outliers
- Filter and query tables
- Select a proper class for your data
- Clean various classes of data (numeric, string, categorical, integer, ...)
Welcome to this course on Data Cleaning in R with Tidyverse, Dplyr, Data.table, Tidyr and many more packages!
You may already know this problem: Your data is not properly cleaned before the analysis so the results are corrupted or you can not even perform the analysis.
To be brief: you can not escape the initial cleaning part of data science. No matter which data you use or which analysis you want to perform, data cleaning will be a part of the process. Therefore it is a wise decision to invest your time to properly learn how to do this.
Now as you can imagine, there are many things that can go wrong in raw data. Therefore a wide array of tools and functions is required to tackle all these issues. As always in data science, R has a solution ready for any scenario that might arise. Outlier detection, missing data imputation, column splits and unions, character manipulations, class conversions and much more - all of this is available in R.
And on top of that there are several ways in how you can do all of these things. That means you always have an alternative if you prefer that one. No matter if you like simple tools or complex machine learning algorithms to clean your data, R has it.
Now we do understand that it is overwhelming to identify the right R tools and to use them effectively when you just start out. But that is where we will help you. In this course you will see which R tools are the most efficient ones and how you can use them.
You will learn about the tidyverse package system - a collection of packages which works together as a team to produce clean data. This system helps you in the whole data cleaning process starting from data import right until the data query process. It is a very popular toolbox which is absolutely worth it.
To filter and query datasets you will use tools like data.table, tibble and dplyr.
You will learn how to identify outliers and how to replace missing data. We even use machine learning algorithms to do these things.
And to make sure that you can use and implement these tools in your daily work there is a data cleaning project at the end of the course. In this project you get an assignment which you can solve on your own, based on the material you learned in the course. So you have plenty of opportunity to test, train and refine your data cleaning skills.
As always you get the R scripts as text to copy into your RStudio instance. And on course completion you will get a course certificate from Udemy.
R-Tutorials Team