Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
This course will provide a set of foundational statistical modeling tools for data science. In particular, students will be introduced to methods, theory, and applications of linear statistical models, covering the topics of parameter estimation, residual diagnostics, goodness of fit, and various strategies for variable selection and model comparison. Attention will also be given to the misuse of statistical models and ethical implications of such misuse.
This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.
Logo adapted from photo by Vincent Ledvina on Unsplash
Syllabus
- Introduction to Statistical Models
- In this module, we will introduce the basic conceptual framework for statistical modeling in general, and for linear regression models in particular.
- Linear Regression Parameter Estimation
- In this module, we will learn how to fit linear regression models with least squares. We will also study the properties of least squares, and describe some goodness of fit metrics for linear regression models.
- Inference in Linear Regression
- In this module, we will study the uses of linear regression modeling for justifying inferences from samples to populations.
- Prediction and Explanation in Linear Regression Analysis
- In this module, we will identify how models can predict future values, as well as construct interval estimates for those values. We will also explore the relationship between statistical modelling and causal explanations.
- Regression Diagnostics
- In this module, we will learn how to diagnose issues with the fit of a linear regression model. In particular, we will use formal tests and visualizations to decide whether a linear model is appropriate for the data at hand.
- Model Selection and Multicollinearity
- In this module, we will study methods for model selection and model improvement.. In particular, we will learn when and how to apply model selection techniques such as forward selection and backward selection, criterion-based methods, and will learn about the problem of multicollinearity (also called collinearity).
Taught by
Brian Zaharatos