Advanced R Programming for Data Analytics in Business
Indian Institute of Technology Kanpur and NPTEL via Swayam
-
78
-
- Write review
Overview
ABOUT THE COURSE: Over the next few decades, Data Science (DS), Machine-Learning (ML), and AI (Artificial Intelligence) will play a crucial role in several aspects of business decision-making and management information systems. Leaders in organizations need to capitalize on data analytics to gain a competitive advantage in the modern business landscape. The application of cutting-edge data analytics techniques implemented with R programming (and RStudio, a powerful IDE(Integrated Development Environment) will prepare the learners for business analytics workflow and make them job ready for mid-to-senior managerial positions in various business and industry settings. In this course, you will use advanced data analytics tools to explore, clean, wrangle, visualize, and process business data to generate useful insights and make inferences from raw and unstructured data. The course will also introduce the learners to use cases from business, finance, and management areas and problem sets that require advanced techniques for processing the data and communicating the results, and providing managerial implications. This course has been carefully designed to cater to not only business, finance, and management professionals but also those from other industries and academics that significantly rely on data-driven decision-making. The operating environment for all types of organizations (engineering and management) has become extremely dynamic and data-driven and continues to evolve at an extremely fast pace, with technological innovations at the heart of this change. Against this backdrop, DS, ML, and AI are providing new opportunities for all market participants, i.e., business leaders, policymakers, regulators, and governments. The objective of this course is to help the learners understand and apply these modern DS, ML, and AI techniques in the business, finance, and management industry. This includes solving real-life business, finance, and management problems to improve organizational decision-making.INTENDED AUDIENCE: Management students (Ph.D., MBA, BBA), Commerce students (BCom, M.Com.), Chartered Accountants, Science (B.Sc., M.Sc.), and Engineering students (B-Tech, M-Tech), Finance professionals (Investment analysts, banking professionals, accountants, credit analysts), Data ScientistsINDUSTRY SUPPORT: Data Science and Business Analytics: Mu Sigma Analytics, Fractal Analytics, Manthan. Latent View, Tiger Analytics, Absolutdata, Convergytics, UST Global; Equity research firms, Credit rating firms, Investment Banks, Corporate Banking sector, Corporate Finance roles across all corporates (ICRA, ICICI, HDFC, Nomura, Lehman Brothers, SBI Capital Markets, Deutsche bank, HSBC Bank, etc.)
Syllabus
Week 1: Advanced R programming for Data Science: Introduction and Background
Fundamentals of R: Installation and set-up, set working directory, packages, and libraries; R operators: Arithmetic, assignment, comparison, and logical operators; Working with different data types; Vector creation and manipulation; Miscellaneous functions: Sequence, repetition, sorting, generate random numbers, user-defined functions, lapply, sapply, and tapply function; Factor variables, Indexing, Data coercion, conditional statementsWeek 2:Introduction to Data Visualization with R:
Basic Plotting types: Barchart, Pie Chart, Histogram, Density plot, Boxplot; Plot customization: Adding legend, Adding color in plots, Adding axis labels and chart title, Modifying axis and scales; Overlay plots in RWeek 3:Advanced Data Visualization with ggplot2:
Key components; Color, size, shape, and other aesthetic attributes; Faceting: Wrap faceting and Grid faceting;Plot geoms: Adding a smoother to a plot, Boxplots, jitterplots, histogram, frequency polygons,Time series with line and path plots; Modifying the axes; Quick plots; Correlation matrix with ggplot.
Week 4:Exploratory Data Analysis (EDA) and Data Wrangling:
Reading and writing the data, exporting, and saving a dataframe; Data handling and cleaning:
Recording the variables, dealing with NAs, adding a row and column to the dataframe, wide to long data formats, merging the dataframes.
Week 5:Handling Complex Date and Time Objects:
Getting the current date and time, POSIX classes (POSIXct and POSIXlt), Parsing dates, Date and time components, Dates not in Standard Format; Operations on dates: subtract/add, finding difference, generating a sequence, truncate; Time zones; Time intervals: Interval and overlaps;
Periods and durations; Date arithmetic; Rounding the datesWeek 6:Basic Statistics with R:
Measures of central tendency, Measures of Variability, Measures of Shape; summary statistics by group; Dealing with outliers: Truncate and Winsorize.
Week 7:Probability and Stochastics with R:
Probability Distribution, Binomial Distribution, Normal Distribution, Sampling Distribution, Types of Sampling: Probability vs non-Probability
Week 8:Advanced Inferential Statistics with R:
One-sample test, two-sample test, T statistics, Z statistics, Test with Proportion, Test with variances; ANOVA: one way and two ways
Week 9:Introduction to Model Building and Evaluation: Simple and Multiple Linear Regression Modeling (SLRM):
Linearity and normality, Fitting SLRM, Storing and printing the regression results, Interpretation of the regression results, Diagnosis of the fitted model, Tests for autocorrelation and heteroscedasticity, Computation of robust standard errors, and Visualization of regression resultsWeek 10:Introduction to Time-series Modelling and Panel Data Methods
Time-series modelling, issues with time-series data, basic time-series properties, Introduction to pandel data, Reading & Writing Panel Data, Panel Data Manipulation, Outlier Treatment, Panel Data Visualization, Descriptive Statistics Pooled OLS, Fixed Effect Estimation, LSDV Estimation, Random Effect Estimation, Diagnostic Tests, Residual Analysis, Robust EstimationWeek 11:Advanced Non-Linear Modelling and Evaluation: Quantile Regression Method
Reading & Writing Quantile Data, Quantile Data Manipulation, Outlier Treatment, Quantile Data Visualization, Diagnostic Tests, Residual Analysis, Robust EstimationWeek 12:Advanced Classification Methods: Logit/Probit Regression Modelling
Introduction to Classification Algorithms, Linear probability models, Introduction to Logit/Probit Modelling, Thresholding and Classification Matrix, ROC Curve, Parameter Interpretation, Maximum Likelihood Estimation, and Goodness-of-Fit measures.
Fundamentals of R: Installation and set-up, set working directory, packages, and libraries; R operators: Arithmetic, assignment, comparison, and logical operators; Working with different data types; Vector creation and manipulation; Miscellaneous functions: Sequence, repetition, sorting, generate random numbers, user-defined functions, lapply, sapply, and tapply function; Factor variables, Indexing, Data coercion, conditional statementsWeek 2:Introduction to Data Visualization with R:
Basic Plotting types: Barchart, Pie Chart, Histogram, Density plot, Boxplot; Plot customization: Adding legend, Adding color in plots, Adding axis labels and chart title, Modifying axis and scales; Overlay plots in RWeek 3:Advanced Data Visualization with ggplot2:
Key components; Color, size, shape, and other aesthetic attributes; Faceting: Wrap faceting and Grid faceting;Plot geoms: Adding a smoother to a plot, Boxplots, jitterplots, histogram, frequency polygons,Time series with line and path plots; Modifying the axes; Quick plots; Correlation matrix with ggplot.
Week 4:Exploratory Data Analysis (EDA) and Data Wrangling:
Reading and writing the data, exporting, and saving a dataframe; Data handling and cleaning:
Recording the variables, dealing with NAs, adding a row and column to the dataframe, wide to long data formats, merging the dataframes.
Week 5:Handling Complex Date and Time Objects:
Getting the current date and time, POSIX classes (POSIXct and POSIXlt), Parsing dates, Date and time components, Dates not in Standard Format; Operations on dates: subtract/add, finding difference, generating a sequence, truncate; Time zones; Time intervals: Interval and overlaps;
Periods and durations; Date arithmetic; Rounding the datesWeek 6:Basic Statistics with R:
Measures of central tendency, Measures of Variability, Measures of Shape; summary statistics by group; Dealing with outliers: Truncate and Winsorize.
Week 7:Probability and Stochastics with R:
Probability Distribution, Binomial Distribution, Normal Distribution, Sampling Distribution, Types of Sampling: Probability vs non-Probability
Week 8:Advanced Inferential Statistics with R:
One-sample test, two-sample test, T statistics, Z statistics, Test with Proportion, Test with variances; ANOVA: one way and two ways
Week 9:Introduction to Model Building and Evaluation: Simple and Multiple Linear Regression Modeling (SLRM):
Linearity and normality, Fitting SLRM, Storing and printing the regression results, Interpretation of the regression results, Diagnosis of the fitted model, Tests for autocorrelation and heteroscedasticity, Computation of robust standard errors, and Visualization of regression resultsWeek 10:Introduction to Time-series Modelling and Panel Data Methods
Time-series modelling, issues with time-series data, basic time-series properties, Introduction to pandel data, Reading & Writing Panel Data, Panel Data Manipulation, Outlier Treatment, Panel Data Visualization, Descriptive Statistics Pooled OLS, Fixed Effect Estimation, LSDV Estimation, Random Effect Estimation, Diagnostic Tests, Residual Analysis, Robust EstimationWeek 11:Advanced Non-Linear Modelling and Evaluation: Quantile Regression Method
Reading & Writing Quantile Data, Quantile Data Manipulation, Outlier Treatment, Quantile Data Visualization, Diagnostic Tests, Residual Analysis, Robust EstimationWeek 12:Advanced Classification Methods: Logit/Probit Regression Modelling
Introduction to Classification Algorithms, Linear probability models, Introduction to Logit/Probit Modelling, Thresholding and Classification Matrix, ROC Curve, Parameter Interpretation, Maximum Likelihood Estimation, and Goodness-of-Fit measures.
Taught by
Prof. Abhinava Tripathi