Use statistical learning techniques like linear regression and classification to solve common machine learning problems. Complete short coding assignments in Python.
Overview
Syllabus
- Week 1: Statistical Learning
- This module introduces the standard theoretical framework used to analyze statistical learning problems. We start by covering the concept of regression function and the need for parametric models to estimate it due to the curse of dimensionality. We continue by presenting tools to assess the quality of a parametric model and discuss the bias-variance tradeoff as a theoretical framework to understand overfitting and optimal model flexibility.
- Week 2: Linear Regression
- In this module, we cover the problem of linear regression. We start with a formal statement of the problem, we derive a solution as an optimization problem, and provide a closed-form expression using the matrix pseudoinverse. We then move on to analyze the statistical properties of the linear regression coefficients, such as their covariance and variances. We use this statistical analysis to determine coefficient accuracy and analyze confidence intervals. We then move on to the topic of hypothesis testing, which we use to determine dependencies between input variables and outputs. We finalize with a collection of metrics to measure model accuracy, and continue with the introduction to the Python programming language. Please note, there is no formal assignment this week, and we hope that everyone participates in the discussion instead.
- Week 3: Extended Linear Regression
- In this module, you will learn how to include categorical (discrete) inputs in your linear regression problem, as well as nonlinear effects, such as polynomial and interaction terms. As a companion to this theoretical content, there are two recitation videos that demonstrate how to solve linear regression problems in Python. You will need to use this knowledge to complete a programming project.
- Week 4: Classification
- In this module, we introduce classification problems from the lens of statistical learning. We start by introducing a generative model based on the concept of conditional class probability. Using these probabilities, we show how to build the Bayes optimal classifier which minimizes the expected misclassification error. We then move on to present logistic regression, in conjunction with maximum likelihood estimation, for parametric estimation of the conditional class probabilities from data. We also extend the idea of hypothesis testing to the context of logistic regression.
Taught by
Chris Callison-Burch and Victor Preciado