Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Data Science Foundations: Data Assessment for Predictive Modeling

via LinkedIn Learning

Overview

Explore the data understanding phase of the CRISP-DM methodology for predictive modeling. Find out how to collect, describe, explore, and verify data.

Syllabus

Introduction
  • Why data assessment is critical
  • A note about the exercise files
1. What Is Data Assessment?
  • Clarifying how data understanding differs from data visualization
  • Introducing the critical data understanding phase of CRISP-DM
  • Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP
  • Navigating the transition from business understanding to data understanding
  • How to organize your work with the four data understanding tasks
2. Collect Initial Data
  • Considerations in gathering the relevant data
  • A strategy for processing data sources
  • Getting creative about data sources
  • How to envision a proper flat file
  • Anticipating data integration
3. First Look at the Data
  • Reviewing basic concepts in the level of measurement
  • What is dummy coding?
  • Expanding our definition of level of measurement
  • Taking an initial look at possible key variables
  • Dealing with duplicate IDs and transactional data
  • How many potential variables (columns) will I have?
  • How to deal with high-order multiple nominals
  • Challenge: Identifying the level of measurement
  • Solution: Identifying the level of measurement
4. Data Loading and Unit of Analysis
  • Introducing the KNIME Analytics Platform
  • Tips and tricks to consider during data loading
  • Unit analysis decisions
  • Challenge: What should the row be?
  • Solution: What should the row be?
5. Describe Data
  • How to uncover the gross properties of the data
  • Researching the dataset
  • Tips and tricks using simple aggregation commands
  • A simple strategy for organizing your work
6. Data Description Case Studies
  • Describe data demo using the UCI heart dataset
  • Challenge: Practice describe data with the UCI heart dataset
  • Solution: Practice describe data with the UCI heart dataset
7. Explore Data Basics
  • The explore data task
  • How to be effective doing univariate analysis and data visualization
  • Anscombe's quartet
  • The Data Explorer node feature in KNIME
  • How to navigate borderline cases of variable type
  • How to be effective in doing bivariate data visualization
  • Challenge: Producing bivariate visualizations for case study 1
  • Solution: Producing bivariate visualizations for case study 1
8. Explore Data Tips and Tricks
  • How to utilize an SME's time effectively
  • Techniques for working with the top predictors
  • Advice for weak predictors
  • Tips and tricks when searching for quirks in your data
  • Learning when to discard rows
  • Introducing ggplot2
  • Orientating to R's ggplot2 for powerful multivariate data visualizations
  • Challenge: Producing multivariate visualizations for case study 1
  • Solution: Producing multivariate visualizations for case study 1
9. Verify Data Quality
  • Exploring your missing data options
  • Why you lose rows to listwise deletion
  • Investigating the provenance of the missing data
10. Missing Data Case Study
  • Introducing the KDD Cup 1998 data
  • What is the pattern of missing data in your data?
  • Is the missing data worth saving?
  • Assessing imputation as a potential solution
11. Explore and Verify Case Studies
  • Exploring and verifying data quality with the UCI heart dataset
  • Challenge: Quantifying missing data with the UCI heart dataset
  • Solution: Quantifying missing data with the UCI heart dataset
12. Making the Transition to Data Preparation
  • Why formal reports are important
  • Creating a data prep to-do list
  • How to prepare for eventual deployment
Conclusion
  • Next steps

Taught by

Keith McCormick

Reviews

4.7 rating at LinkedIn Learning based on 105 ratings

Start your review of Data Science Foundations: Data Assessment for Predictive Modeling

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.