Data Modeling and Regression Analysis in Business

Overview

The course will begin with what is familiar to many business managers and those who have taken the first two courses in this specialization. The first set of tools will explore data description, statistical inference, and regression. We will extend these concepts to other statistical methods used for prediction when the response variable is categorical such as win-don’t win an auction. In the next segment, students will learn about tools used for identifying important features in the dataset that can either reduce the complexity or help identify important features of the data or further help explain behavior.

Syllabus

Module 0: Get Ready & Module 1: Introduction to Analytics and Evolution of Statistical Inference

This session is an overview of the business data analytics process and its components. We introduce you to different modeling paradigms and invite you to match problems to modeling paradigms. The module concludes with an overview of Rattle (an interface for the statistical package R) and its use for univariate analysis.

Module 2: Dating with Data

This session focuses on identifying relationships between dependent and independent variables using a regression model. The goal is to find the best fitted model to the data to learn about the underlying relationship of variables in the population.

Module 3: Model Development and Testing with Holdout Data

This session introduces the student to use of a holdout data set for evaluating model performance. Methods of improving the model are discussed with emphasis on variable selection. Nuances of modeling discrete predictor variables and response variables are discussed.

Module 4: Curse of Dimensionality

There has been a tremendous increase in the way data generation via sensors, digital platforms, user-generated content, etc. are being used in the industry. For example, sensors continuously record data and store it for analysis at a later point. In the way data gets captured, there can be a lot of redundancy. With more variables, comes more trouble! There may be very little (or no) incremental information gained from these sources. This is the problem of a high number of unwanted dimensions. To avoid this pitfall, data transformation and dimension reduction comes to the rescue by examining and extracting fewer dimensions while ensuring that it conveys the full information concisely.