Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

DSCI 602: Statistical Methods for Data Science (2024)

Ball State University via Coursera

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Welcome to the Ball State University course “Statistical Methods for Data Science.” This course is about Statistical Methods for data scientists. To make good sense of data, you will need the right tools and analytics methods. We are going to take a systematic approach to learn about the right tools and methods you can use. Note that as data scientists it is important for us to be able to connect data and learn how the world around us works. To accomplish this challenging task, we will learn how we can connect data through probability theory and statistical models and take actionable decisions, confirm a hypothesis, or make predictions. After completing the course, you will be able to: 1) Apply probability and distribution theory to address real world problems related to the data science field; 2) Classify the type of random variables and their probability distributions used to model various types of data in practice; 3) Outline the properties of discrete and continuous random variables; 4) Explain the sampling distributions of sample statistics such as the sample mean and the sample proportion; 5) Explain the Laws for Large numbers for the sample mean and the sample proportion; 6) Choose and use appropriate inference strategies such as the right estimation method or the hypothesis test to make inferences on unknown population parameters; 7) Illustrate the estimation process and hypothesis testing as the mode of statistical inference; 8) Outline multivariate discrete and continuous distributions to understand the joint behavior of several correlated discrete and continuous variables, respectively; 9) Relate multivariate analysis techniques to dimension reduction problems; 10) Utilize the R computational environment for probability simulation and other statistical computing in this course.

Syllabus

  • Probability Theory: A Review
    • Welcome! In part 1 of this module you will complete a recommended reading about the course and post on a discussion board entry to introduce yourself to your classmates. In part 2 of this module, we will review probability theory and its applications to real-world problem-solving.  Probability is a measure of the chance of occurrence of a future event. For example, what is the probability that you will see two heads when you toss two coins? It is ¼, right? Why do you care about learning probability? Here is a quote by the ancient Greek philosopher Democritus “Everything existing in the universe is the fruit of chance”. Thus, it is important for us to have basic probability knowledge. In data science,  probability helps us understand how data is generated and plays a major role in inference and prediction.In this module, we will review three definitions of probability, probability laws, conditional probability, and Bayes' rule. Knowledge of conditional probability is essential in most practical problems. Bayes' rule provides a mechanism for determining conditional probabilities when prior probabilities are given. 
  • Random Variables and Their Properties
    • In this module, we will talk about random variables which are basically a mapping or correspondence between the sample space of a random experiment and the real number system.
  • Discrete Parametric Family of Distributions, Part 1
    • In this module, we will learn about discrete probability distributions based on what is known as Bernoulli Trials. You will learn about Bernoulli, Binomial, Geometric, and Negative Binomial Distributions. These distributions are widely used in numerous applications including health and biomedical sciences, social sciences, environmental sciences, finance and business, and education among others.
  • Continuous Probability Distributions - Part I
    • This module covers continuous probability distributions. In the real world, not all random variables are discrete. For example, daily rainfall amount, the lifetime of an equipment, biological measures such as the body mass index or BMI and Cholesterol levels, and various test scores take values in intervals and are called continuous random variables.
  • Role of Normal Distribution in Statistical Inference
    • In this module, we will revisit Normal distribution and its attractive properties. You will see how the law of large numbers can be used to approximate the distributions of sum or average of sample data.

Taught by

Vinayak Tanksale and Munni Begum

Reviews

Start your review of DSCI 602: Statistical Methods for Data Science (2024)

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.