Scalability is one of the biggest challenges in data science. Learn how to evaluate data, choose the right algorithms, and perform predictive modeling at scale.
Overview
Syllabus
Introduction
- Scaling machine learning initiatives
- Defining terms
- Data and supervised machine learning
- The nine big data bottlenecks
- The stages of predictive analytics data
- Why you might have too little data
- How much data do I need?
- Balancing
- Who truly has big data?
- Assessing data
- Selecting: Data that should be left out
- Seasonality and time alignment
- Data and the data scientist
- Aggregate and restructure
- Dummy coding
- Feature engineering
- Understanding the modeling process
- Slow algorithms: Brute force
- Slow algorithms: More calculations
- Slow algorithms: More models
- How to sample properly
- Modeling with missing data
- Looking ahead to deployment and scoring in production
- Continuing your predictive modeling journey
Taught by
Keith McCormick