Overview
Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
Syllabus
- Week 1
- In this first week of the course, we look at finding data and reading different file types.
- Week 2
- Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.
- Week 3
- Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.
- Week 4
- Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.
Taught by
Jeff Leek
Tags
Reviews
3.4 rating, based on 58 Class Central reviews
4.5 rating at Coursera based on 8064 ratings
Showing Class Central Sort
-
Getting and cleaning data is the third course in the first wave of John Hopkins’s data science specialization track on Coursera. It is recommended that you take this course after the data scientist's toolkit and R programming courses. The title of…
-
I'm a fresh beginner to R and my only experience with it is from the previous 2 courses in this specialization. The lectures aren't so bad... they're a little bit boring and not engaging since they rarely are more than just a voiceover and slides.…
-
Class information is very sparse. There's a huge gap between the (minimal) content provided in the lectures and the class project required for completion of the course. This is the worst constructed college course and worst MOOC I have ever encountered. I've completed 12 MOOCs, 2 bachelor's degrees, and several graduate courses at Stanford, so that is a distinction earned by Johns Hopkins U from among a very wide field. A complete overhaul of this course and series is desperately needed.
-
Dropping this course because there is such a disconnect between what is taught and what is expected to complete the project and quizzes. I found myself using external sources to learn all of the material necessary. Many of the questions are vague, leaving you spending hours trying to complete tasks only to realize that the objective is different and just not communicated effectively. There is no coherent order to how they deliver the material, teaching basic concepts in week 3 which should have been covered in week 1 or the prior course in R programming. So, I will just use others' tutorials to learn data science in R. Ridiculous that I wasted so much time on this!
-
Extremely frustrating class, I spent tons of time wondering what is it that I am actually suppose to do...
I am considering dropping the specialization. -
Course is lacking any kind of logic or structure. It's simply methods/functions thrown one after another. Complete lack of perspective.
-
A rather poor and confusing course. The lectures are not so great. I'm rather dissapointed with it. Normally these courses are rather good, but not this one.
-
i didn't learn much from course lectures or materials, rather i learned most from stack over flow.really a big disappointment.
-
This is the third course in the series, and it's taken me this long to realize that everything I learn comes from external sources and not the course itself. If you do this, you'll learn something. If you don't, you'll lose your mind and waste a ton…
-
This is the third course in the Data Science specialization. The course is all about how to read data of different formats into R and how to create tidy datasets (one variable per column, one observation per row, one observational unit type per tabl…
-
This course just provides an outline on the subject. Its upto you to figure out how to get the assignment done .. Google and StackOverflow is your instructors .. Really! To make things worse, the course assignment instructions are very ambiguous and you spend tons of time trying to understand the problem than solving it. If thats the intend of this course, they have succeeded in it, but when you have a course deadline (and a full time job as many of you do), its extremely frustrating.
-
What were taught in video materials are nothing compared to the quiz and final projects. At this point I'm still re-reading my final project assignment data, and although I can sense some things that needs to be done to finish this project, it has taken me hours into StackOverflow or some other R blogs (just to make sure the command/formula I type is right). Very frustrating compared to other Coursera modules I finished. After this I may drop the Data Scientist specialisation altogether.
-
There is a complete disconnect between what is taught and what is expected in the project and tests. The course is pretty bad. I was considering doing the specialization in Data Science and this course is making me re-think this goal.
I understand that you need to be good at 'hacking' to be a good data scientist, but if that's the case then what's the point of paying money to have to Google everything. -
It's not free at all.
Providing demo doesn't mean free
I tried enrolling to the so called free course and I couldn't make it without providing credit card
It's providing free demo but the course itself is not free at all -
There is a significant gap between the video lecture and the assignments/quizzes.
Very horrible... I paid my course for certification, and I cann't retake it for free. -
The course is a part of very good 'data science with R' program (don't know current name cause it changes) available at Coursera.
The program is quite massive, it contains about 8 courses but is really thorough and well presented. It is designed with even complete beginners in mind, so may start it without any prior knowledge. -
This course teaches a lot of extremely important skills in data science. No matter what you end up doing, dealing with data quality is going to be a part of it. This is a challenging class, and rightly so, as the work is tedious, but oh-so-important! The lectures do get a bit bland, but are informative.
-
Getting and Cleaning Data promises to teach students how to extract data from common data storage formats (including databases, specifically SQL, XML, JSON, and HDF5), and from the web using API's and web scraping. The syllabus also includes tips on using R to clean and recode data, and, in the last lecture, a long list of links to sources of data. It's also worth noting that the style of the video lectures is a bit different from those of other classes I've taken: there's never any video of the instructor, just the instructor's voice over the lecture notes.
-
-
Ok, this course is really helpful!
Everything on it has no waste at all, this course is a must for a data scientist!