This course is an introduction to data science and statistical thinking. Learners will gain experience with exploring, visualizing, and analyzing data to understand natural phenomena and investigate patterns, model outcomes, and do so in a reproducible and shareable manner. Topics covered include data visualization and transformation for exploratory data analysis. Learners will be introduced to problems and case studies inspired by and based on real-world questions and data via lecture and live coding videos as well as interactive programming exercises. The course will focus on the R statistical computing language with a focus on packages from the Tidyverse, the RStudio integrated development environment, Quarto for reproducible reporting, and Git and GitHub for version control. The skills learners will gain in this course will prepare them for careers in a variety of fields, including data scientist, data analyst, quantitative analyst, statistician, and much more.
Overview
Syllabus
- Hello World
- Hello World! In the first module, you will learn about what data science is and how data science techniques are used to make meaning from data and inform data-driven decisions. There is also discussion around the importance of reproducibility in science and the techniques used to achieve this. Next, you will learn the technology languages of R, RStudio, Quarto, and GitHub, as well as their role in data science and reproducibility.
- Data and Visualization
- In our second module, we'll advance our understanding of R to set the stage for creating data visualizations using tidyverse’s data visualization package: ggplot2. We'll learn all about different data types and the appropriate data visualization techniques that can be used to plot these data. The majority of this module is to help best understand ggplot2 syntax and how it relates to the Grammar of Graphics. By the end of this module, you will have started building up the foundation of your statistical tool-kit needed to create basic data visualizations in R.
- Visualizing, transforming, and summarizing types of data
- In this module, we will take a step back and learn about tools for transforming data that might not yet be ready for visualization as well as for summarizing data with tidyverse’s data wrangling package: dplyr. In addition to describing distributions of single variables, you will also learn to explore relationships between two or more variables. Finally, you will continue to hone your data visualization skills with plots for various data types.
Taught by
Mine Çetinkaya-Rundel and Dr. Elijah Meyer