In the era of big data, acquiring the ability to analyze and visually represent “Big Data” in a compelling manner is crucial. Therefore, it is essential for data scientists to develop the skills in producing and critically interpreting digital maps, charts, and graphs. Data visualization is an increasingly important topic in our globalized and digital society. It involves graphically representing data or information, enabling decision-makers across various industries to comprehend complex concepts and processes that may otherwise be challenging to grasp. DSCI 605 Data visualization serves as the foundation for understanding principles, concepts, techniques, and tools used to visualize information in large, intricate data sets. It also provides hands-on experience in visualizing big data using the open-source software R. Through the course, students will learn to evaluate the effectiveness of visualization designs and think critically about decisions, such as color choice and visual encoding. Additionally, students will create their own data visualizations and become proficient in using R.
The course comprises four sections. The first section caters to learners with minimal or no experience in R, establishing the groundwork for data visualization with R. The second section introduces preliminary data visualization techniques, allowing students to gain hands-on experience with common visualization practices for Exploratory Data Analysis (EDA) using ggplot2. This section emphasizes data exploration before delving into advanced data mining. The third section builds upon existing data visualization skills by delving into advanced data visualization topics, including interactive data visualization, time series plotting, and spatial mapping.
The primary objective of the first three sections is to equip students with a well-developed set of skills, enabling them to create a wide range of visualizations in R. The final section focuses on completing a final project, where students apply the skills, theory, and experiences gained from the previous sections. The project entails developing a data visualization that effectively communicates a compelling story to the audience and readers.
Overview
Syllabus
- Introduction to Data Visualization and Getting Started with R
- In the first module, we will learn what is data visualization, why data visualization is necessary in data science field, what data visualization will do and what skills data visualization need. We will first get started with R by learning R basic and R Markdown to prepare the data visualization in the course.
- Graphics Components for Data Visualization
- Understanding the elements and components of data visualization is essential for data visualization because it provides a systematic framework for creating effective and meaningful visual representations of data.In this module, we will explore the grammar of graphics, explain some rational, and introduce principles in data visualization, as well as describe the common Exploratory Data Analysis (EDA) idioms' features and applications.
- ggplot2
- Let's get our hands wet with real data visualization-producing a graph. In this module, we will explore the powerful data visualization package ggplot2. In this module, you will learn basic usages of ggplot() function, the fill and color aesthetics, and learn to create a histogram using ggplot() and setting suitable bin numbers or bin width.
- Embed Images and Tables in R Markdown Files
- Now you have conducted the basic data wrangling, documented your work in R Markdown, and created your first data visualization in previous modules. In this module, you will learn to embed, create and refer to images and tables in R Markdown. In addition, you will learn to produce scatter plots, which further enrich your visualization experience and enhance your visualization skills.
- Boxplot and Multiple-view Layout
- This module will continue for one of the common EDA idioms-box plots to enrich your data visualization experience and will explore new technique-layout multiple plots on one page. In this module, you will learn to produce boxplots using ggplot(), interpret boxplots and arrange multiple plots on one page.
Taught by
Dr. Aihua Li