ABOUT THE COURSE:"Probability Theory for Data Science" is a specialized course designed to equip students with the essential knowledge and skills needed to analyze uncertain phenomena and make data-driven decisions in various domains. It provides a comprehensive understanding of the principles of probability and their applications in the context of data science. This course is essential for anyone aspiring to work in data-driven fields such as machine learning, artificial intelligence, statistics, and predictive analytics. The course typically begins with an introduction to basic concepts in probability theory, including sample spaces, events, and different approaches to defining probability. Emphasis is placed on developing a solid grasp of fundamental probability rules, such as the addition and multiplication rules, as well as understanding conditional probability, independence, and Bayes theorem. As the course progresses, students delve deeper into more advanced topics, such as random variables, probability distributions, and expectation. They explore common probability distributions, including binomial, Poisson, uniform, exponential, and normal distributions, and they learn how to calculate probabilities and expected values associated with these distributions. Additionally, students gain insight into concepts like moments, variance, and covariance, which are crucial for understanding data variability in real-world scenarios. Following this, students delve into the study of multiple random variables, exploring concepts such as the transformation of random variables, moment-generating functions, and key theorems pertaining to the convergence of random variables. "Probability Theory for Data Science" equips students with a strong theoretical foundation in probability and the analytical skills necessary to tackle complex data-driven problems. By mastering the principles of probability theory and its applications in data science, students are better prepared to excel in diverse roles within the rapidly growing field of data analytics and machine learning.INTENDED AUDIENCE: Graduate students and researchers from Academics and Industry who are interested in Data Science.PREREQUISITES: 10+2 Mathematics
Overview
Syllabus
Week 1: Phenomena; Definitions: sample space, events; Set operations; Definitions of probability: classical approach, frequency approach, axiomatic approach; Important theorems; Examples; Conditional probability; Examples; Independence; Mutually exclusive vs independence; Bayes’ theorem; Examples.Week 2: Random variable; Events defined by random variables; Examples; Distribution function; Properties of distribution function; Examples; Discrete random variable; Continuous random variable; Mean; Moment; Variance; Examples.Week 3: Bernoulli distribution; Binomial distribution; Poisson Distribution; Uniform distribution; Exponential distribution; Gamma distribution; Normal distribution; Conditional distribution; Examples.Week 4: Bivariate random variables; Examples; Joint distribution function; Properties; Independence; Marginal distribution function; Example.Week 5: Joint probability mass function; marginal probability mass function; Examples; Joint probability density function; Marginal probability density function; Examples.Week 6: Conditional probability mass functions; Examples; Conditional probability density function; Examples, Moments; Covariance and correlation coefficient, Examples.Week 7: Conditional mean; Examples; Conditional variance; Examples; Multivariate random variable; Multivariate probability mass function; Multivariate probability density function; Independence, Moments; Examples.Week 8: Multinomial distribution; Multivariate normal distribution; Applications; Transformation of random variables: theorems, examples.Week 9: Moment generating function: theorems, examples; Characteristic function; Chebychev’s inequality; Examples; The weak law of large numbers; The strong law of large numbers, The central limit theorem.Week 10: Concepts of Statistical Inference; Random Sample; Examples; Statistic; Point estimate; Examples; Sampling distributions; Maximum likelihood estimation method; Examples.Week 11: Confidence interval: concept, definition; Examples; Testing of hypotheses; Definitions: simple hypotheses, composite hypotheses, null hypotheses, alternative hypotheses, critical region; Two types of errors, Size of the test; Power of the test; Examples.Week 12: Neyman–Pearson lemma; Examples; Likelihood ratio test; Examples; Guidelines; Examples.
Taught by
Prof. Ishapathik Das