The Best Statistics & Probability Courses for Data Science — Class Central Career Guides
Part two of our six-piece series that recommends the best MOOCs for launching yourself into the data science industry
Editor’s note: Drop us a note at [email protected] if you have any feedback or requests for particular career guides. We are also looking for contributors!
Here are the parts of the series that have been published so far:
- The Best Intro to Programming Courses for Data Science
- The Best Statistics & Probability Courses for Data Science (this one)
- The Best Intro to Data Science Courses
- The Best Data Visualization Courses
- The Best Machine Learning Courses
Our picks
The best online introductory statistics and probability courses for people looking to learn data science are the University of Texas at Austin’s “Foundations of Data Analysis” two-part series (“Statistics Using R” and “Inferential Statistics”). The series includes two of the top reviewed courses available with a weighted average rating of 4.48 out of 5 stars over 20 reviews. It is one of the few courses/series in the upper echelon of ratings that teach statistics with a focus on coding up examples.
Foundations of Data Analysis – Part 1: Statistics Using R by the University of Texas at Austin on edX
Foundations of Data Analysis – Part 2: Inferential Statistics by the University of Texas at Austin on edX
A stellar specialization
Update (December 5, 2016): Our original second recommendation, UC Berkeley’s “Stat2x: Introduction to Statistics” series, closed their enrollment a few weeks after the release of this article. We promoted our top recommendation in “The Competition” section accordingly.
Based on a course that had a 4.82-star weighted average rating over 55 reviews, Duke University’s Statistics with R Specialization is another great option. The five-course specialization, which is relatively new, has a comprehensive syllabus with full sections dedicated to probability. There are only five reviews for the new individual courses, so their 3.6-star weighted average rating should be taken with a grain of salt.
Statistics with R Specialization by Duke University on Coursera
…which contains the following five courses:
- Introduction to Probability and Data
- Inferential Statistics
- Linear Regression and Modeling
- Bayesian Statistics
- Statistics with R Capstone
Want more probability?
If you want to dive deeper into probability, opt for MIT’s “6.041x: Introduction to Probability – The Science of Uncertainty” instead of UC Berkeley’s probability offering above. MIT’s offering has by far the highest weighted average rating (4.91 out of 5 stars over 34 reviews) of any course/series considered in this guide. 6.041x is almost identical to MIT’s on-campus version. It covers more probability than a standard introduction to probability and statistics, plus it is longer (15 weeks) and more challenging than most MOOCs.
Introduction to Probability – The Science of Uncertainty by the Massachusetts Institute of Technology (MIT) on edX
Table of Contents
- Why You Should Trust Us
- About the Data Science Career Guide
- Statistics AND Probability
- How We Picked Courses to Consider
- How We Tested
- Why Target Coding?
- Our Picks
- A Stellar Specialization
- Want More Probability?
- The Competition
- About Class Central Career Guides
- Author Bio
Why You Should Trust Us
I started creating my own data science master’s degree using online courses almost a year ago. I scoured the statistics landscape and have taken a few courses, and audited portions of many. I know the options and what content is needed for those targeting a data analyst or data scientist role.
For this guide, I spent 15+ hours trying to identify every single online introduction to statistics course offered as of November 2016, extracting key bits of information from their syllabi and reviews, and compiling their ratings. For this task, I turned to none other than the open source Class Central community and its database of thousands of course ratings and reviews.
Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.
About the Data Science Career Guide
Class Central’s Data Science Career Guide is a six-piece series that recommends the best MOOCs for launching yourself into the data science industry. The first five pieces recommend the best courses for several data science core competencies (programming, statistics, the data science process, data visualization, and machine learning). The final piece is a summary of those courses and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.
Here are the parts of the series that have been published so far:
- The Best Intro to Programming Courses for Data Science
- The Best Statistics & Probability Courses for Data Science (this one)
- The Best Intro to Data Science Courses
- The Best Data Visualization Courses
P.S. If you are looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.
Statistics AND Probability
Probability is not statistics and vice versa. My favorite explanation of their differences is from Stony Brook University:
“Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events.”
They explain that “probability is primarily a theoretical branch of mathematics, which studies the consequences of mathematical definitions,” while “statistics is primarily an applied branch of mathematics, which tries to make sense of observations in the real world.”
Statistics is generally regarded as one of the pillars of data science. Probability, though it generates less attention, is also an important part of a data science curriculum. Joe Blitzstein, a Professor in the Harvard Statistics Department, stated in this popular Quora answer that aspiring data scientists should have a good foundation in probability theory as well. Justin Rising, a data scientist with a Ph.D. in statistics from Wharton, clarified that this “good foundation” means being comfortable with undergraduate level probability.
How We Picked Courses to Consider
Each course must fit four criteria:
- It must be an introductory course with little to no statistics or probability experience required.
- It must be on-demand or offered every few months.
- It must be of decent length: at least ten hours in total for estimated completion.
- It must be an interactive online course, so no books or read-only tutorials. Though these are viable ways to learn statistics and probability, this guide focuses on courses.
We believe we covered every notable course that fits the above criteria. Since there are seemingly hundreds of courses on Udemy, we chose to consider the most reviewed and highest rated ones only. There is a chance we missed something, however. Please let us know if you think that is the case.
How We Tested
We compiled average rating and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. If a series had multiple courses (like the University of Texas at Austin’s two-part “Foundations of Data Analysis” series) the weighted average rating across all courses was calculated. We read text reviews and used this feedback to supplement the numerical ratings.
We made subjective syllabus judgment calls based on three factors:
- Teaches statistics with a focus on coding up examples, preferably in R or Python.
- Coverage of the fundamentals of probability and statistics. Covering descriptive statistics, inferential statistics, and probability theory is ideal.
- How much of the syllabus is relevant to data science? Does the syllabus have specialized content like genomics, as several biostatistics courses do? Does the syllabus cover advanced concepts not often used in data science?
Why Target Coding?
William Chen, a data scientist at Quora who has a master’s in Applied Mathematics from Harvard, wrote the following in this popular Quora answer to the question “How do I learn statistics for data science?”
“For any aspiring data scientist, I would highly recommend learning statistics with a heavy focus on coding up examples, preferably in Python or R.”
Since a lot of a data scientist’s statistical work is carried out programmatically, getting familiar with the most popular tools is beneficial.
Our Picks
Foundations of Data Analysis – Part 1: Statistics Using R by the University of Texas at Austin on edX
Foundations of Data Analysis – Part 2: Inferential Statistics by the University of Texas at Austin on edX
“Foundations of Data Analysis” includes two of the top reviewed statistics courses available with a weighted average rating of 4.48 out of 5 stars over 20 reviews. Among courses/series in the upper echelon of ratings, it is one of the few that teaches statistics with a focus on coding up examples. Though not mentioned in either course titles, the syllabi contain sufficient probability content to satisfy our testing criteria. These courses together have a great mix of fundamentals coverage and scope for the beginner data scientist.
Listed below are the details for each course, including their description, syllabus, and prominent reviews.
Foundations of Data Analysis – Part 1: Statistics Using R
Basic Information
University: University of Texas at Austin
Instructor: Michael J. Mahometa, Lecturer and Senior Statistical Consultant at the University of Texas at Austin
Platform: edX
Pace: Self-paced
Cost: Free
Estimated timeline: 6 weeks at 3-6 hours per week. The total estimated timeline is 18-36 hours, which could feasibly be completed in two weeks if you prefer to binge-study your MOOCs.
Description
In this first part of a two-part course, we’ll walk through the basics of statistical thinking – starting with an interesting question. Then, we’ll learn the correct statistical tool to help answer our question of interest – using R and hands-on Labs. Finally, we’ll learn how to interpret our findings and develop a meaningful conclusion.
We will cover basic Descriptive Statistics – learning about visualizing and summarizing data, followed by a “Modeling” investigation where we’ll learn about linear, exponential, and logistic functions. We will learn how to interpret and use those functions with basic Pre-Calculus. These two “units” will set the learner up nicely for the second part of the course: Inferential Statistics with a multiple regression cap.
Both parts of the course are intended to cover the same material as a typical introductory undergraduate statistics course, with an added twist of modeling. This course is also intentionally devised to be sequential, with each new piece building on the previous topics. Once completed, students should feel comfortable using basic statistical techniques to answer their own questions about their own data, using a widely available statistical software package (R).
With these new skills, learners will leave the course with the ability to use basic statistical techniques to answer their own questions about their own data, using a widely available statistical software package (R). Learners from all walks of life can use this course to better understand their data, to make valuable informed decisions.
Syllabus
[expand title=”View Detailed Syllabus” tag=”b” swaptitle=”Hide Detailed Syllabus” trigclass=”arrowright”]
Week One: Introduction to Data
- Why study statistics?
- Variables and data
- Getting to know R and RStudio
Week Two: Univariate Descriptive Statistics
- Graphs and distribution shapes
- Measures of center and spread
- The Normal distribution
- Z-scores
Week Three: Bivariate Distributions
- The scatterplot
- Correlation
Week Four: Bivariate Distributions (Categorical Data)
- Contingency tables
- Conditional probability
- Examining independence
Week Five: Linear Functions
- What is a function?
- Least squares
- The Linear function – regression
Week Six: Exponential and Logistic Function Models
- Exponential data
- Logs
- The Logistic function model
- Picking a good mode
[/expand]
Reviews
“The best introductory course for statistical use of R!!! The videos are very didactic and it teaches step by step each lesson, as well as the R language. The way the exercises and tests are proposed is very stimulating. I’m waiting for the next course!!!” Link to reviews.
“One of the best online classes I have ever taken, out of about 20. Excellent material, clearly presented and good level of challenge for a novice data analyst. I strongly recommend [Dr. Mahometa] and this class!” Link to reviews.
“I am working as a Biochemist in a large R&D structure. I registered to learn how to use R and to refresh/learn basic statistics or at the least when and why use which approach. So far this course has fully met my expectations, it is very well done, very interesting and tutorials are terrific. The reading part is also well done and contains numerous examples to train oneself. The Pre-Lab, Lab and Problem Sets are also really good into evaluating how we perform. It’s also possible to go a bit more into depth using optional readings. I’m glad I registered for the second course.” Link to reviews.
Foundations of Data Analysis – Part 2: Inferential Statistics
Basic Information
University: University of Texas at Austin
Instructor: Michael J. Mahometa, Lecturer and Senior Statistical Consultant at the University of Texas at Austin
Platform: edX
Pace: Self-paced
Cost: Free
Estimated timeline: 6 weeks at 3-6 hours per week. The total estimated timeline is 18-36 hours, which could feasibly be completed in two weeks if you prefer to binge-study your MOOCs.
Description
In the second part of a two-part statistics course, we’ll learn how to take data and use it to make reasonable and useful conclusions. You’ll learn the basics of statistical thinking – starting with an interesting question and some data. Then, we’ll apply the correct statistical tool to help answer our question of interest – using R and hands-on Labs. Finally, we’ll learn how to interpret our findings and develop a meaningful conclusion.
We will cover basic Inferential Statistics – integrating ideas of Part 1. If you have a basic knowledge of Descriptive Statistics, this course is for you. We will learn how to sample data, examine both quantitative and categorical data with statistical techniques such as t-tests, chi-square, ANOVA, and Regression.
Syllabus
[expand title=”View Detailed Syllabus” tag=”b” swaptitle=”Hide Detailed Syllabus” trigclass=”arrowright”]
Week One: Introduction to Data
- Why study statistics?
- Variables and data
- Getting to know R and RStudio
Week Two: Sampling
- Why study statistics?
- The sampling distribution
- Central limit theorem
- Confidence intervals
Week Three: Hypothesis Testing (One and Two Group Means)
- What makes a hypothesis test?
- Errors in testing
- Alpha and critical values
- Single sample test
- Independent t-test and Dependent t-test
Week Four: Hypothesis Testing (Categorical Data)
- The chi-square test
- Goodness-of-Fit
- Test-of-Independence
Week Five: Hypothesis Testing (More Than Two Group Means)
- The ANOVA
- One-way ANOVA
- Two-way ANOVA
Week Six: Hypothesis Testing (Quantitative data)
- Correlation
- Simple (single variable) regression
- Multiple regression
[/expand]
Reviews
“Excellent course! I took Prof. Mahometa’s part 1 of the course and fell in love with R (with no prior knowledge). This I think can be taken individually but might have a steeper learning curve. The course is designed beautifully with pre-labs, labs and assignments that cement the concepts learned through text and videos. I have been around on edX since it started and I must say it is hard to find such well-designed course and that too [are offered] for free. I hope Prof. Mahometa design more courses on advanced topics. It will be a treat to learn.” Link to reviews.
“Excellent course! I took part 1 and enjoyed it a lot, so it was very easy to decide to go on with part 2. Dr. Mahometa and team are very good teachers and their material is of a very high quality. The exercises are interesting and the materials (videos, labs and problems) are appropriate and well chosen. I recommend this course to anyone interested in statistical analysis (as an introduction to machine learning, big data, data science, etc.). On a scale from 1 to 10, I give 50!” Link to reviews.
A stellar specialization
Update (December 5, 2016): Our original second recommendation, UC Berkeley’s “Stat2x: Introduction to Statistics” series, closed their enrollment a few weeks after the release of this article. We promoted our top recommendation in “The Competition” section accordingly.
Statistics with R Specialization by Duke University on Coursera
…which contains the following five courses:
- Introduction to Probability and Data
- Inferential Statistics
- Linear Regression and Modeling
- Bayesian Statistics
- Statistics with R Capstone
This five-course specialization is based on Duke’s excellent Data Analysis and Statistical Inference course, which had a 4.82-star weighted average rating over 55 reviews. The specialization is taught by the same professor, plus a few additional faculty members. The early reviews on the new individual courses, which have a 3.6-star weighted average rating over 5 reviews, should be taken with a grain of salt due to the small sample size. The syllabi are comprehensive and have full sections dedicated to probability.
Listed below are the details for the specialization, including each course’s description and syllabus. Prominent reviews follow.
Statistics with R Specialization
Basic Information
University: Duke University
Instructors: Mine Çetinkaya-Rundel, David Banks, Colin Rundel, and Merlise A Clyde
Platform: Coursera
Pace: Self-paced
Number of courses: 5 (including capstone)
Cost: Free and paid options available, though grading requires payment
Estimated timeline: Each course has an estimated timeline of 4-5 weeks at 5-7 hours per week
Description
In this Specialization, you will learn to analyze and visualize data in R and created reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform frequentist and Bayesian statistical inference and modeling to understand natural phenomena and make data-based decisions, communicate statistical results correctly, effectively, and in context without relying on statistical jargon, critique data-based claims and evaluated data-based decisions, and wrangle and visualize data with R packages for data analysis.
You will produce a portfolio of data analysis projects from the Specialization that demonstrates mastery of statistical data analysis from exploratory analysis to inference to modeling, suitable for applying for statistical analysis or data scientist positions.
Syllabus
[expand title=”View Detailed Syllabus” tag=”b” swaptitle=”Hide Detailed Syllabus” trigclass=”arrowright”]
Course #1: Introduction to Probability and Data
This course introduces you to sampling and exploring data, as well as basic probability theory and Bayes’ rule. You will examine various types of sampling methods, and discuss how such methods can impact the scope of inference. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The concepts and techniques in this course will serve as building blocks for the inference and modeling courses in the Specialization.
Topics:
- Introduction to Data
- Exploratory Data Analysis and Introduction to Inference
- Introduction to Probability
- Probability Distributions
Course #2: Inferential Statistics
This course covers commonly used statistical inference methods for numerical and categorical data. You will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of your analysis in a way that is interpretable for clients or the public. Using numerous data examples, you will learn to report estimates of quantities in a way that expresses the uncertainty of the quantity of interest. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The course introduces practical tools for performing data analysis and explores the fundamental concepts necessary to interpret and report results for both categorical and numerical data.
Topics:
- Central Limit Theorem and Confidence Interval
- Inference and Significance
- Inference for Comparing Means
- Inference for Proportions
Course #3: Linear Regression and Modeling
This course introduces simple and multiple linear regression models. These models allow you to assess the relationship between variables in a data set and a continuous response variable. Is there a relationship between the physical attractiveness of a professor and their student evaluation scores? Can we predict the test score for a child based on certain characteristics of his or her mother? In this course, you will learn the fundamental theory behind linear regression and, through data examples, learn to fit, examine, and utilize regression models to examine relationships between multiple variables, using the free statistical software R and RStudio.
Topics:
- Linear Regression
- More about Linear Regression
- Multiple Regression
Course #4: Bayesian Statistics
This course describes Bayesian statistics, in which one’s inferences about parameters or hypotheses are updated as evidence accumulates. You will learn to use Bayes’ rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm. The course will apply Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction.
Topics:
-
The Basics of Bayesian Statistics
- Bayesian Inference
- Decision Making
- Bayesian Regression
- Perspectives on Bayesian Applications
Course #5: Statistics with R Capstone
The capstone project will be an analysis using R that answers a specific scientific/business question provided by the course team. A large and complex dataset will be provided to learners and the analysis will require the application of a variety of methods and techniques introduced in the previous courses, including exploratory data analysis through data visualization and numerical summaries, statistical inference, and modeling as well as interpretations of these results in the context of the data and the research question. The analysis will implement both frequentist and Bayesian techniques and discuss in context of the data how these two approaches are similar and different, and what these differences mean for conclusions that can be drawn from the data. A sampling of the best final projects will be featured on the Duke Statistical Science department website. Note: Only learners who have passed the four previous courses in the specialization are eligible to take the Capstone.
[/expand]
Reviews
On the Inferential Statistics course: “This course is awesome on so many levels. This is the best inferential statistics course I’ve come across. Here’s why. The slides are beautiful and visually appealing, making following the rigorous content easier to digest. Instructors are captivating and articulate, the explanations are clear and concise. The assignments are very very tough, making the course incredibly challenging, but worth it. This is a huge plus. Without challenge, good statistics understanding won’t come…Again, this is an amazing course! This is rare stuff! It is without a doubt, a lot of passion and effort has been put into this course and this series.” Link to review.
“One of the greatest courses I’ve taken so far. [Dr. Mine Çetinkaya-Rundel is] a great teacher, very much involved in exchanges with her students. A large variety of teaching approaches and tools. Lots of practice through short tests, R-programming labs, and an in-depth project. A very lively forum with lots of help to cope with difficulties. The course is not too difficult, but the variety of the proposed material requires that students get involved quite substantially. A very nice book available for free with plenty of practice exercises.” Link to review.
Want more probability?
Introduction to Probability – The Science of Uncertainty by the Massachusetts Institute of Technology (MIT) on edX
Consider the above MIT course if you want a deeper dive into the world of probability. It is a masterpiece with a weighted average rating of 4.91 out of 5 stars over 34 reviews. Be warned: it is a challenge and much longer (16 weeks total at 12 hours per week) than most MOOCs. The level at which the course covers probability is also not necessary for the data science beginner.
Listed below are the details for the course, including its description, syllabus, and prominent reviews.
Introduction to Probability – The Science of Uncertainty
Basic Information
University: Massachusetts Institute of Technology (MIT)
Instructors: John Tsitsiklis and Patrick Jaillet, both of whom are professors in the Department of Electrical Engineering and Computer Science at MIT
Platform: edX
Pace: Self-paced
Cost: Free
Estimated timeline: 16 weeks at 12 hours per week. The total estimated timeline is 192 hours, which could feasibly be completed in less than two months if you prefer to binge-study your MOOCs.
Description
The world is full of uncertainty: accidents, storms, unruly financial markets, noisy communications. The world is also full of data. Probabilistic modeling and the related field of statistical inference are the keys to analyzing data and making scientifically sound predictions.
Probabilistic models use the language of mathematics. But instead of relying on the traditional “theorem – proof” format, we develop the material in an intuitive — but still rigorous and mathematically precise — manner. Furthermore, while the applications are multiple and evident, we emphasize the basic concepts and methodologies that are universally applicable.
The course covers all of the basic probability concepts, including:
- multiple discrete or continuous random variables, expectations, and conditional distributions
- laws of large numbers
- the main tools of Bayesian inference methods
- an introduction to random processes (Poisson processes and Markov chains)
The contents of this course are essentially the same as those of the corresponding MIT class (Probabilistic Systems Analysis and Applied Probability) — a course that has been offered and continuously refined over more than 50 years. It is a challenging class, but it will enable you to apply the tools of probability theory to real-world applications or your research.
The course material is organized along units that are aligned with the chapters of the textbook. Each unit contains between one and three lecture sequences. Each lecture sequence consists of short video clips, interleaved with short problems to test your understanding. Each unit also contains a wealth of supplementary material, including videos that go through the solution of various problems.
Syllabus
[expand title=”View Detailed Syllabus” tag=”b” swaptitle=”Hide Detailed Syllabus” trigclass=”arrowright”]
Unit 0: Overview
Unit 1: Probability models and axioms
- L1: Probability models and axioms
Unit 2: Conditioning and independence
- L2: Conditioning and Bayes’ rule
- L3: Independence
Unit 3: Counting
- L4: Counting
Unit 4: Discrete random variables
- L5: Probability mass functions and expectations
- L6: Variance; Conditioning on an event; Multiple r.v.’s
- L7: Conditioning on a random variable; Independence of r.v.’s
Unit 5: Continuous random variables
- L8: Probability density functions
- L9: Conditioning on an event; Multiple r.v.’s
- L10: Conditioning on a random variable; Independence; Bayes’ rule
Unit 6: Further topics on random variables
- L11: Derived distributions
- L12: Sums of r.v.’s; Covariance and correlation
- L13: Conditional expectation and variance revisited; Sum of a random number of r.v.’s
Unit 7: Bayesian inference
- L14: Introduction to Bayesian inference
- L15: Linear models with normal noise
- L16: Least mean squares (LMS) estimation
- L17: Linear least mean squares (LLMS) estimation
Unit 8: Limit theorems and classical statistics
- L18: Inequalities, convergence, and the Weak Law of Large Numbers
- L19: The Central Limit Theorem (CLT)
- L20: An introduction to classical statistics
Unit 9: Bernoulli and Poisson processes
- L21: The Bernoulli process
- L22: The Poisson process
- L23: More on the Poisson process
Unit 10: Markov chains
- L24: Finite-state Markov chains
- L25: Steady-state behavior of Markov chains
- L26: Absorption probabilities and expected time to absorption
[/expand]
Reviews
“Many online courses are watered down in some way, but this one feels like a proper rigorous exercise-driven course similar to what you’d get in-person at a top school like MIT. The professors present concepts in lectures that have obviously been honed to a laser focus through years of pedagogical experience – there is not a single wasted second in the presentations and they go exactly at the right pace and detail for you to understand the concepts. The exercises will make you work for your knowledge and are critical for really internalizing the concepts. This is the best online course I have taken in any subject.” Link to review.
Please visit the Class Central’s page for this course to read the rest of the reviews.
The competition
Our #1 pick had a weighted average rating of 4.48 out of 5 stars over 20 reviews. Let’s look at the other alternatives.
- MedStats: Statistics in Medicine (Stanford University/Stanford OpenEdx): Great syllabus where the examples have medical a focus. Covers a bit of R programming at the end, though not as much as UT Austin’s series. A worthy option for anyone, even those not targeting medicine. It has a 4.58-star weighted average rating over 32 reviews.
- SOC120x: I “Heart” Stats: Learning to Love Statistics (University of Notre Dame/edX): Targets a non-technical audience, though likely would be good for anyone. No coding. Good production value. Course and instructors look really fun. It has a 4.54-star weighted average rating over 12 reviews.
- QM101x: Statistics for Business (Indian Institute of Management Bangalore/edX): Part of a 4-course series. Business focus. Good syllabus that uses coding. The last two courses in the series are unreleased as of November 2016 so can’t make a judgment yet. It has a 4.43-star weighted average rating over 27 reviews.
- Workshop in Probability and Statistics (Udemy): Taught by Dr. George Ingersoll, Associate Dean of Executive MBA Programs at the UCLA Anderson School of Management. Costs money. Uses Excel. It has a 4.4-star weighted average rating over 452 reviews.
- Intro to Descriptive Statistics (San Jose State University/Udacity): Part of a 2-course series. Bite-sized videos. No coding. It has a 3.88-star weighted average rating over 8 reviews.
- Intro to Inferential Statistics (San Jose State University/Udacity): Part of a 2-course series. Bite-sized videos. No coding. It has a 4.4-star weighted average rating over 5 reviews.
- 6.008.1x: Computational Probability and Inference (Massachusetts Institute of Technology/edX):One of two courses/series to teach statistics with a focus of coding up examples in Python. Reviews suggest prior stats experience is needed and that the course is a bit unorganized. It has a 4-star weighted average rating over 12 reviews.
- Basic Statistics (University of Amsterdam/Coursera): One of two statistics courses in the University of Amsterdam’s Methods and Statistics in Social Sciences Specialization. One exceedingly positive review on the series and its instructors. No coding. It has a 4.06-star weighted average rating over 8 reviews.
- Inferential Statistics (University of Amsterdam/Coursera): One of two statistics courses in the University of Amsterdam’s Methods and Statistics in Social Sciences Specialization. One exceedingly positive review on the series and its instructors. No coding. It has a 4-star weighted average rating over 3 reviews.
- PH525.1x: Statistics and R (Harvard University/edX): Part of a 7-course series on edX. Life sciences focus. Uses R programming, but the reviews suggest UT Austin’s series is better. It has a 3.96-star weighted average rating over 26 reviews.
- PH525.3x: Statistical Inference and Modeling for High-throughput Experiments (Harvard University/edX): Part of a 7-course series on edX. Life sciences focus. Uses R programming, but the reviews suggest UT Austin’s series is better. It has a 4.63-star weighted average rating over 4 reviews.
- Intro to Statistics (Udacity): This is one of Udacity’s earliest courses and it has its shortcomings, as described in this memorable review by a college educator. No coding. It has a 3.93-star weighted average rating over 41 reviews.
- Mathematical Biostatistics Boot Camp 1 (Johns Hopkins University/Coursera): Part of a 2-course series. Biostatistics focus. It has a 3.13-star weighted average rating over 23 reviews.
- Mathematical Biostatistics Boot Camp 2 (Johns Hopkins University/Coursera): Part of a 2-course series. Biostatistics focus. It has a 3.83-star weighted average rating over 3 reviews.
- KIexploRx: Explore Statistics with R (Karolinska Institutet/edX): More of a data exploration course than a statistics course. Uses coding. It has a 3.77-star weighted average rating over 22 reviews.
- Statistical Inference (Johns Hopkins University/Coursera): One of two statistics courses in JHU’s data science specialization. Bad reviews. It has a 2.9-star weighted average rating over 29 reviews.
- Regression Models (Johns Hopkins University/Coursera): One of two statistics courses in JHU’s data science specialization. Bad reviews. It has a 2.73-star weighted average rating over 30 reviews.
- DS101X: Statistical Thinking for Data Science and Analytics (Columbia University/edX): Part of the Microsoft Professional Program Certificate in Data Science. Short syllabus. Bad reviews. It has a 2.77-star weighted average rating over 24 reviews.
- Understanding Clinical Research: Behind the Statistics (University of Cape Town/Coursera): “This isn’t a comprehensive statistics course, but it offers a practical orientation to the field of medical research and commonly used statistical analysis.” Health care focus. It has a 5-star weighted average rating over 15 reviews.
- MED101x: Introduction to Applied Biostatistics: Statistics for Medical Research (Osaka University/edX): Biostatistics focus. Uses coding. It has a 4.5-star weighted average rating over 3 reviews.
- Probability and Statistics (Stanford University/Stanford OpenEdx): Curriculum looks great. The one review is really positive. No coding. It has a 4.5-star weighted average rating over 1 review.
- Inferential and Predictive Statistics for Business (University of Illinois at Urbana-Champaign/Coursera): Part of a 7-course Managerial Economics and Business Analysis Specialization. Uses Excel. It has a 5-star weighted average rating over 1 review.
- Exploring and Producing Data for Business Decision Making (University of Illinois at Urbana-Champaign/Coursera): Part of a 7-course Managerial Economics and Business Analysis Specialization. Uses Excel. It has a 5-star weighted average rating over 1 review.
- Introduction to Probability, Statistics, and Random Processes (University of Massachusetts Amherst/Independent): Videos not available for the whole course. It has a 2.5-star weighted average rating over 2 reviews.
- 005x: Introduction to Statistical Methods for Gene Mapping (Kyoto University/edX): Genetics focus. Need prior statistics and R knowledge. It has a 2.5-star weighted average rating over 1 review.
- Statistics for Genomic Data Science (Johns Hopkins University/Coursera): Genomic focus. Not a good introductory course: “A fair class for someone with an interest in this field who also happens to have a decent background in R programming.” It has a 2-star weighted average rating over 2 reviews.
The following courses had no reviews as of November 2016.
- Statistical Thinking in Python (Part 1) and Statistical Thinking in Python (Part 2) (DataCamp): Uses coding and Python specifically, making it one of few worthy courses or series that use that language. Seven hours of video and 120+ exercises. DataCamp is a popular option.
- A Hands-on Introduction to Statistics with R (DataCamp): Uses coding. 26 hours of video content and 45k+ participants. DataCamp is a popular option.
- Statistical Computing with R – a gentle introduction (University College London/Independent): Uses coding. No review data.
- Probability & Statistics (Carnegie Mellon): Uses R. Primarily text-based instruction. Designed to be equivalent to one semester of a college statistics course.
- Introduction to Probability and Statistics (Massachusetts Institute of Technology/MIT OCW): No review data. Traditional lecture format (video-taped).
- Fundamentals of Engineering Statistical Analysis (The University of Oklahoma/Janux): No review data.
- Elementary Business Statistics (The University of Oklahoma/Janux): Business focus. No review data.
- STAT101x: Biostatistics for Big Data Applications (The University of Texas Medical Branch/edX): Biostatistics focus. No review data.
- 416.1x: Probability: Basic Concepts & Discrete Random Variables (Purdue University/edX): Part of a 2-course series. No review data.
- 416.2x: Probability: Distribution Models & Continuous Random Variables (Purdue University/edX): Part of a 2-course series. No review data.
- Business Statistics and Analysis Specialization (Rice University/Coursera): Uses Excel. No review data.
- Statistics 110: Probability (Harvard University): No review data. Traditional lecture format (video-taped). Often recommended on Quora.
- Statistics (Dataquest): A multi-course series with about 12 hours of content. No review data. Subscription required. One of two courses/series to teach statistics with a focus of coding up examples in Python.
About Class Central Career Guides
Class Central Career Guides are recommendations for the best online courses and MOOCs.
Class Central Career Guides are recommendations for the best online courses and MOOCs. They have one goal: to enable you to quickly figure out which courses can help you learn new skills and advance your career. Our editorial picks are thoroughly researched using reviews written by Class Central users, as well as data from other sources and our own subjective analysis.
These guides are updated frequently to always reflect the best in online education.
Drop us a note at [email protected] if you have any feedback or requests for particular career guides — it will help us prioritize. Also, reach out to us if you want to help us create more of these career guides. We are looking for contributors!
Author Bio
David Venturi created a personalized data science master’s curriculum for himself using MOOCs. He has a dual degree in Chemical Engineering and Economics, and especially enjoys math, stats, and coding. He’s a huge baseball and hockey fan, and writes about the latter with a focus on analytics.
Teerth Brahmbhatt
Thank you so much for this great guide. However the Berkeley Introduction to Probability course is closed for enrollment on Edx, is there any way to access it that you know of?
David Venturi
Thanks for pointing this out. I don’t know of any way to access the course while closed. I emailed the professor to see if this is permanent and updated the article noting the situation.
Nischal Shakya
You can take the archived course. I took a couple of lectures from Probability – The Science of Uncertainty and Data through archived course. I am not sure how I found the archived course. However, in archived course you will only be able watch the lecture videos but not access any of the practice problems.
Sam
Any idea about SpringBoard Data science career Track. Looks like they don’t curate there courses but rather license the courses across different online providers…
molto vivace
This is great. I find myself at a disadvantage however because I mainly work with Python, and I feel that many statistics courses focus more on R (which totally makes sense). I think it’s about time I started learning R too.
Manish
hi @David Venturi ,
Hi have one question for the course “Introduction to Probability – The Science of Uncertainty” by MIT on edx . My question is, which language is used in this course for python or R . Please let me know ASAP
Nischal Shakya
No programming language is used in this course. I am not sure about probability courses with python but if you want to use R, you can study the book named “Introduction to Probability” by Joe Blitzstein and Jessica Hwang which contains R sections or “Probability and Statistics for Data Science: Math + R + Data” by Norman Matloff.
The former book is based on a Harvard Stats course name “Stat 110” available in edx as well as on youtube.