Data Science

Johns Hopkins University via Coursera Specialization

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!

Grab it

Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.

Syllabus

Course 1: The Data Scientist’s Toolbox
- Offered by Johns Hopkins University. In this course you will get an introduction to the main tools and ideas in the data scientist's ... Enroll for free.

Course 2: R Programming
- Offered by Johns Hopkins University. In this course you will learn how to program in R and how to use R for effective data analysis. You ... Enroll for free.

Course 3: Getting and Cleaning Data
- Offered by Johns Hopkins University. Before you can work with data you have to get some. This course will cover the basic ways that data can ... Enroll for free.

Course 4: Exploratory Data Analysis
- Offered by Johns Hopkins University. This course covers the essential exploratory techniques for summarizing data. These techniques are ... Enroll for free.

Course 5: Reproducible Research
- Offered by Johns Hopkins University. This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible ... Enroll for free.

Course 6: Statistical Inference
- Offered by Johns Hopkins University. Statistical inference is the process of drawing conclusions about populations or scientific truths from ... Enroll for free.

Course 7: Regression Models
- Offered by Johns Hopkins University. Linear models, as their name implies, relates an outcome to a set of predictors of interest using ... Enroll for free.

Course 8: Practical Machine Learning
- Offered by Johns Hopkins University. One of the most common tasks performed by data scientists and data analysts are prediction and machine ... Enroll for free.

Course 9: Developing Data Products
- Offered by Johns Hopkins University. A data product is the production output from a statistical analysis. Data products automate complex ... Enroll for free.

Course 10: Data Science Capstone
- Offered by Johns Hopkins University. The capstone project class will allow students to create a usable/public data product that can be used ... Enroll for free.

Courses

27 reviews
4 weeks, 3-5 hours a week, 3-5 hours a week
View details

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.
166 reviews
17 hours 59 minutes
View details

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.
245 reviews
57 hours
View details

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples.
58 reviews
20 hours
View details

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
38 reviews
1-2 hours
View details

In this 1-hour long project-based course, you will learn exploratory data analysis techniques and create visual methods to analyze trends, patterns, and relationships in the data. By the end of this project, you will have applied EDA on a real-world dataset.

This class is for learners who want to use Python for applying data visualization and data analysis, and for learners who are currently taking a basic machine learning course or have already finished a machine learning course and are searching for a practical data visualization and analysis project course. Also, this project provides learners with basic knowledge about exploratory analysis and improves their skills in creating maps which helps them in fulfilling their career goals by adding this project to their portfolios.
27 reviews
7-8 hours
View details

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
34 reviews
54 hours
View details

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data.
33 reviews
54 hours
View details

Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing.
27 reviews
8-9 hours
View details

One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
17 reviews
10 hours
View details

A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. This course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course will focus on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience.
4 reviews
5-6 hours
View details

The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.

Taught by

Brian Caffo, PhD, Jeff Leek, PhD and Roger D. Peng, PhD

Reviews

4.6 rating, based on 9 Class Central reviews

Start your review of Data Science

Dave Hurst

Coursera Data Science Specialization (John Hopkins Universitu) Successfully completing the inaugural capstone for the JHU/Coursera data science track was a thrill for me. The timing for the first track couldn't have been better, as at the time I wa…

Coursera Data Science Specialization (John Hopkins Universitu)
Successfully completing the inaugural capstone for the JHU/Coursera data science track was a thrill for me. The timing for the first track couldn't have been better, as at the time I was using MOOC's to shore up what I was learning in parallel for Master of Science in Predictive Analytics at Northwestern University. The NU degree was rigorous and heavy on statistics and analytical management theory, but I wanted more practical hands on training to reinforce what I was learning. The JHU track was perfect for that. The first course (Data Science Toolbox) in the sequence is a little light, but sets the stage for combining data science learnings with practical state of the art tools that I have come to use on a daily basis. Some of the more advanced courses, like Reproducible Research and Developing Data Products weren't on my radar initially but ended up providing me with new skills that I have turned to again and again. I can't emphasize enough how fantastic both of those courses are. My main criticism (and the loss of the star) is for the Statistical Inference and Regression Models classes -- it's just not realistic to expect students to learn these topics in 4 weeks. I got by, since those were exactly the areas my NU coursework focused on, so Coursera acted as a certification tool, rather than a teaching medium in that case. I expect students with less background in those areas to be really frustrated with the pace, and although they may pass would be dangerously lacking in those very important areas.
The Capstone though was a fantastic experience. The pace was frenetic and a HUGE step up from the other courses, but it was a lot of fun. Ours was based around text analytics and introduced me to a new area. By the time the students had whittled down do less than 500 qualified to take the capstone (form over 2 million!!) I found the cohort to be a really talented group of people that paralleled my NU degree (can't say that for any other MOOC enrollment I've been a part of). I'm not sure if they are still using the same capstone (they did for the second offering of the capstone), which would be a shame, because breaking new ground (versus trolling github for existing code) was part of the glory of that project.
In short, I highly recommend this track, and commend all 3 of the instructors for putting together a brilliant program. I was able to power through from April to December because I was already studying those topics in parallel, but I'd recommend others not in the same situation, or already familiar with the material to take it slowly through the tougher classes to let the learnings sink in.
Anonymous

Great for folks new to computer science, data science, coding! The courses are great. There's videos and resources for learning, quizzes to make sure you're retaining information, and at least one assignment demonstrating your learning per course. T…

Great for folks new to computer science, data science, coding!
The courses are great. There's videos and resources for learning, quizzes to make sure you're retaining information, and at least one assignment demonstrating your learning per course. The courses are about a month long with deadlines for respective components throughout. That said, you can work ahead and finish the course as quickly as you like or are able (which is nice!).

I was also continuously impressed by the peer-to-peer support. So many people are willing to help one another and work collaboratively towards a solution when faced with an issue. I was surprised to have more discussion board activity, support, and productivity here than I've had in more formal, expensive, for-credit courses. Shout out to helpful peers and the TAs!

This was a great introduction to the topic and allowed me to quickly develop various skill sets. It was also enough information and resources to make me want to learn, play around, and spend more time on the various topics. That's on me to do, though. One of the other great things this program teaches you is to seek out information from various sources or places, as well as empowering you to make your own resource.

If you're already proficient with coding, this may be slow or extremely easy for you. More than half my battle was making concepts work with R code and/or using command prompt to push, pull, and post data to various places. Again, though, you can work ahead through the courses, so you can work through things you already are familiar with and/or get through the content even faster.
Anonymous

An entry course that opens up your data science career I strongly recommend this course to anyone who has taken statistics/economics/data analysis courses at college and would like to get some training in big-data analysis. My training background is…

An entry course that opens up your data science career
I strongly recommend this course to anyone who has taken statistics/economics/data analysis courses at college and would like to get some training in big-data analysis. My training background is Economics which contains substantial data-analysis training with economic data. I would say I benefit substantially from the machine learning course in this track. I realize that the econometric models such as linear regression/binary-choice models are only part of the machine learning techniques, while it comes to analyzing big datasets, other methods such as decision tree and SVM models are commonly used as well. This specialization track also helps to brush up my data analyzing skills and applications to a broader, larger dataset. Eventually, it helps me transit my career from an economics researcher to a data scientist in a bay area tech company.
Anonymous

Very good certificate for starting a career in Data Science
I think this certificate is a very good way of starting a career in Data Science. It covers all the themes you need for developing in R with statistical knowledge. It also has more lectures for curious people. The coursera staff is always helping and the platform is amazing.
W.dijkhuis

Good intro to data science for the well prepared Suppose you: - have programming experience (preferably C, C++, C# or Java ) - have a solid knowledge of the basics of statistics (descriptive and inferential). then this specialization is a great int…

Good intro to data science for the well prepared
Suppose you:
- have programming experience (preferably C, C++, C# or Java )
- have a solid knowledge of the basics of statistics (descriptive and inferential).
then this specialization is a great introduction to data science.

The specialization "teaches" statistics from scratch, but only those with an IQ over 130 and who are very determined have a remote chance of finishing these modules; for the rest of us these modules are impossible to pass -unless you already know the stuff they are trying to teach-.
The programming in R is imho very good, but fast paced and probably hard for those without programming experience.

The really good parts are:
1) the professors are experienced and practicing data scientist, they let you look over their shoulder and show you how the pro's do it.
2) the course covers the complete data science pipeline: from analyzing the problem, getting and cleaning the data to presenting the results.
3) there are engaging assignments and projects.
4) an emphasis on "how to do it" skills (not on theory)

You might want to do one or two modules in stead of the whole specialization.
1) programming in R is very good (for the well prepared)
2) reproducible research is an valuable and unique module
3) the same for developing data product
Anonymous

Great introductory Data Science course This course covers all the major aspects of the Data Science field. The instructors are reasonably good. However, some of the statistics aspects will be easier if you have some basic knowledge. (I took a basic…

Great introductory Data Science course
This course covers all the major aspects of the Data Science field. The instructors are reasonably good. However, some of the statistics aspects will be easier if you have some basic knowledge. (I took a basic stats course before this series to refresh my memory.) The exercises and projects are both pretty easy. The grading system makes it easy to achieve high scores. If you really want to learn more, then I would recommend going beyond what are mentioned as minimum requirements for the projects. The Capstone project was also very interesting and relevant to current industry trends.

I'm not sure if this course alone will get you ready for a job in Data Science. You may need some prior background or some further work (in terms of projects) to land a job.
Anonymous

Informative and Valuable course with great resources
This course is a great introduction to data science and related disciplines like analytics, statistics, machine learning etc.
Stats and R programming can be challenging for those who do not have a stats or programming background. All in all, this was a very informative learning experience.
Anonymous

4 stars all around
I learned what I needed to know to manage and hire data scientists and coordinate well with statisticians. I might have completed the series (only skipped the capstone) if the capstone had been a project more related to my work (mapping).
Anonymous

interesting capstone
The capstone project is good experience to apply the basics of data science and machine learning. The courses covered the major topics of statistics along with R programming.

Go to class

Data Science

Overview

Syllabus

Courses

Computing for Data Analysis

The Data Scientist’s Toolbox

R Programming

Getting and Cleaning Data

Exploratory Data Analysis

Reproducible Research

Statistical Inference

Regression Models

Practical Machine Learning

Developing Data Products

Data Science Capstone

Taught by

Tags

Reviews

Computing for Data Analysis

The Data Scientist’s Toolbox

R Programming

Getting and Cleaning Data

Exploratory Data Analysis

Reproducible Research

Statistical Inference

Regression Models

Practical Machine Learning

Developing Data Products

Data Science Capstone

Taught by

Tags

Data Science: Foundations using R

Data Science: Statistics and Machine Learning

Executive Data Science

Mastering Software Development in R

Advanced Statistics for Data Science

Data Literacy

10 Best Data Science Courses

From Data to Insights: 10 Best Data Analysis Courses for 2024

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

15 Best R Courses for 2024: Uncover Insights, Drive Decisions

1800+ Coursera Courses That Are Still Completely FREE

250 Top FREE Coursera Courses of All Time

Massive List of MOOC-based Microcredentials

Never Stop Learning.