Course Review: Mining Massive Datasets, offered by Stanford on Coursera
If you’re interested in Machine Learning and Data Mining and want to learn with what kind of challenges are posed by huge datasets in applying standard algorithms, then you’ll find this course extremely valuable.
Review by Prakhar Srivastav. Originally published here. Took the course? Write your own review here.
I first stumbled onto MMDS or CS246 (as its called in Stanford), a graduate level course on (you guessed it) data mining in early 2012 when I had recently finished Andrew Ng’s course on Machine Learning. With professors like Anand Rajaraman (of Amazon) and Jeff Ullman teaching the course and making their book freely available, I got quite interested and wished that it be offered on Coursera some day. Fast forward 2 years and I see a mail from Coursera informing me that the course is up for grabs. Without hesitating, I hurriedly signed up and waited eagerly for the course to start.
The course lasted for around 8 weeks comprising of long lectures, quizzes and a final exam.
FACULTY
Like most MOOCs, MMDS is taught by one of the best faculty from the field. I’ve been an avid follower of Anand Rajaraman’s blog before I joined this course and I have to say the enthusiasm of the faculty is infectious and their expertise with the material is markedly evident.
DIFFICULTY
MMDS is a CS graduate level course (CS246) from Stanford. That means the topics are not trivial, the lectures are dense and you as a student are expected to invest significant time into understanding the material. On average I spent around 6-8 hours per week on the lectures and quizzes. Since this is hard, grasping the concepts and getting the quiz right is quite gratifying. There’s also an advanced section for students who want to challenge themselves more. As an incentive, a certificate of achievement with distinction is awarded to these students.
MATERIAL
The syllabus and the topics covered in this blog are extremely relevant for any one aspiring to work in the data mining / machine learning field. Having done Andrew Ng’s ML course, this course acts a perfect supplement and covers a lot of practical aspects of implementing the algorithms when applied to massive data sets. For example, a recent lecture talked about how the BFR algorithm for finding clusters works better than k-means for a very large dataset.
BOOK
The accompanying MMDS book is just awesome and the lectures build upon the content and examples from it. For someone who finds the book a bit too challenging (probably because your math is a bit rusty) the lectures make the material quite approachable.
FINAL EXAM
This was my first course where there was a final exam and in my opinion it made the experience more rewarding. Two exams of 3 hours and 2 hours did take a toll but revising the content at the end helped build a mental model of the concepts and grasp the big picture better, all of which at the end of the day made the learning experience more rewarding and fruitful.
Two exams of 3 hours and 2 hours did take a toll but revising the content at the end helped build a mental model of the concepts and grasp the big picture better, all of which at the end of the day made the learning experience more rewarding and fruitful.
THEORETICAL
The course is primarily theoretical in both its presentation and exercises. This is not to say that algorithms are presented without examples, but that the examples (and the quizzes even more so) are trivial and do not do a great job in illustrating the issues with implementing or applying various algorithms in real-life datasets.
PROGRAMMING ASSIGNMENTS
In sharp contrast to Andrew Ng’s course, there are no compulsory programming assignments. The exercises are all quizzes which check how well you have understood the concepts. There is just one programming assignment which is also optional.
CONCLUSION
Overall, I’m really glad I did this course. The professors emphasize citing industry examples wherever necessary (the PageRank algorithm and accompanying Google’s implementation was covered for 3 lectures), which is a welcome change from other CS courses. Along with the book, I believe the course is a wonderful primer to the field of Data Mining.