Overview
This Specialization is intended for data scientists with some familiarity with the R programming language who are seeking to do data science using the Tidyverse family of packages. Through 5 courses, you will cover importing, wrangling, visualizing, and modeling data using the powerful Tidyverse framework. The Tidyverse packages provide a simple but powerful approach to data science which scales from the most basic analyses to massive data deployments. This course covers the entire life cycle of a data science project and presents specific tidy tools for each stage.
Syllabus
Course 1: Introduction to the Tidyverse
- Offered by Johns Hopkins University. This course introduces a powerful set of data science tools known as the Tidyverse. The Tidyverse has ... Enroll for free.
Course 2: Importing Data in the Tidyverse
- Offered by Johns Hopkins University. Getting data into your statistical analysis system can be one of the most challenging parts of any data ... Enroll for free.
Course 3: Wrangling Data in the Tidyverse
- Offered by Johns Hopkins University. Data never arrive in the condition that you need them in order to do effective data analysis. Data need ... Enroll for free.
Course 4: Visualizing Data in the Tidyverse
- Offered by Johns Hopkins University. Data visualization is a critical part of any data science project. Once data have been imported and ... Enroll for free.
Course 5: Modeling Data in the Tidyverse
- Offered by Johns Hopkins University. Developing insights about your organization, business, or research project depends on effective ... Enroll for free.
- Offered by Johns Hopkins University. This course introduces a powerful set of data science tools known as the Tidyverse. The Tidyverse has ... Enroll for free.
Course 2: Importing Data in the Tidyverse
- Offered by Johns Hopkins University. Getting data into your statistical analysis system can be one of the most challenging parts of any data ... Enroll for free.
Course 3: Wrangling Data in the Tidyverse
- Offered by Johns Hopkins University. Data never arrive in the condition that you need them in order to do effective data analysis. Data need ... Enroll for free.
Course 4: Visualizing Data in the Tidyverse
- Offered by Johns Hopkins University. Data visualization is a critical part of any data science project. Once data have been imported and ... Enroll for free.
Course 5: Modeling Data in the Tidyverse
- Offered by Johns Hopkins University. Developing insights about your organization, business, or research project depends on effective ... Enroll for free.
Courses
-
This course introduces a powerful set of data science tools known as the Tidyverse. The Tidyverse has revolutionized the way in which data scientists do almost every aspect of their job. We will cover the simple idea of "tidy data" and how this idea serves to organize data for analysis and modeling. We will also cover how non-tidy can be transformed to tidy data, the data science project life cycle, and the ecosystem of Tidyverse R packages that can be used to execute a data science project. If you are new to data science, the Tidyverse ecosystem of R packages is an excellent way to learn the different aspects of the data science pipeline, from importing the data, tidying the data into a format that is easy to work with, exploring and visualizing the data, and fitting machine learning models. If you are already experienced in data science, the Tidyverse provides a power system for streamlining your workflow in a coherent manner that can easily connect with other data science tools. In this course it is important that you be familiar with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
-
Getting data into your statistical analysis system can be one of the most challenging parts of any data science project. Data must be imported and harmonized into a coherent format before any insights can be obtained. You will learn how to get data into R from commonly used formats and harmonizing different kinds of datasets from different sources. If you work in an organization where different departments collect data using different systems and different storage formats, then this course will provide essential tools for bringing those datasets together and making sense of the wealth of information in your organization. This course introduces the Tidyverse tools for importing data into R so that it can be prepared for analysis, visualization, and modeling. Common data formats are introduced, including delimited files, spreadsheets and relational databases, and techniques for obtaining data from the web are demonstrated, such as web scraping and web APIs. In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
-
Developing insights about your organization, business, or research project depends on effective modeling and analysis of the data you collect. Building effective models requires understanding the different types of questions you can ask and how to map those questions to your data. Different modeling approaches can be chosen to detect interesting patterns in the data and identify hidden relationships. This course covers the types of questions you can ask of data and the various modeling approaches that you can apply. Topics covered include hypothesis testing, linear regression, nonlinear modeling, and machine learning. With this collection of tools at your disposal, as well as the techniques learned in the other courses in this specialization, you will be able to make key discoveries from your data for improving decision-making throughout your organization. In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
-
Data visualization is a critical part of any data science project. Once data have been imported and wrangled into place, visualizing your data can help you get a handle on what’s going on in the data set. Similarly, once you’ve completed your analysis and are ready to present your findings, data visualizations are a highly effective way to communicate your results to others. In this course we will cover what data visualization is and define some of the basic types of data visualizations. In this course you will learn about the ggplot2 R package, a powerful set of tools for making stunning data graphics that has become the industry standard. You will learn about different types of plots, how to construct effect plots, and what makes for a successful or unsuccessful visualization. In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
-
Data never arrive in the condition that you need them in order to do effective data analysis. Data need to be re-shaped, re-arranged, and re-formatted, so that they can be visualized or be inputted into a machine learning algorithm. This course addresses the problem of wrangling your data so that you can bring them under control and analyze them effectively. The key goal in data wrangling is transforming non-tidy data into tidy data. This course covers many of the critical details about handling tidy and non-tidy data in R such as converting from wide to long formats, manipulating tables with the dplyr package, understanding different R data types, processing text data with regular expressions, and conducting basic exploratory data analyses. Investing the time to learn these data wrangling techniques will make your analyses more efficient, more reproducible, and more understandable to your data science team. In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.
Taught by
Carrie Wright, PhD, Roger D. Peng, PhD, Shannon Ellis, PhD and Stephanie Hicks, PhD