CS190.1x: Scalable Machine Learning

University of California, Berkeley via edX

Go to class Write review

This course may be unavailable.

Details

This course may be unavailable.

Go to class

Provider

edX
Pricing

Free Online Course (Audit)
Languages

English
Certificate

Certificate Available
Duration & workload

5 weeks
Sessions

Finished

Found in

Overview

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘Big Data,’ with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Apache Spark, a cluster computing system well-suited for large-scale machine learning tasks. You will implement scalable algorithms for fundamental statistical models (linear regression, logistic regression, matrix factorization, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

This self-assessment document provides a short quiz, as well as online resources that review the relevant background material.

Taught by

Ameet Talwalkar

Reviews

4.5 rating, based on 31 Class Central reviews

Start your review of CS190.1x: Scalable Machine Learning

Gregory J Hamel ( Life Is Study) @greg

Scalable Machine Learning is a 5-week distributed machine learning course offered by UC Berkeley through the edX platform. It is a follow up to another UC Berkely course: Introduction to Big Data with Apache Spark. Although the first course is not…

Scalable Machine Learning is a 5-week distributed machine learning course offered by UC Berkeley through the edX platform. It is a follow up to another UC Berkely course: Introduction to Big Data with Apache Spark. Although the first course is not a strict perquisite, Salable Machine Learning uses the same virtual machine and even has some overlap with the homework labs, so it is beneficial to take Introduction to Big Data first. Scalable Machine Learning teaches distributed machine learning basics using Pyspark, Apache Spark’s Python API. Basic proficiency with Python is necessary to pass the course and some exposure to algorithms and machine learning concepts is helpful. Course evaluation is based primarily on 5 labs distributed as iPython notebooks.

The first two weeks of the course cover machine learning basics and introduce Apache Spark. For students already familiar with machine learning basics who took Introduction to Big Data, there’s not much new to learn during first two weeks. Week 2 is essentially an exact clone of week 2 of the intro to big data course, including the lab assignment. The final 3 weeks have meatier lecture content and longer labs, each covering a different machine learning technique--linear regression, logistic regression and principal component analysis.

The lecture content is clean and the lecturer speaks clearly. His delivery isn’t perfect, but the only real purpose of the lectures is to serve as background information for the meat of the course: the labs. Each lab is a lengthy iPython notebook with several sections leading you through the process of creating a pipeline for running a machine learning algorithm with Pyspark. Much of the code you need is provided for you, but writing the key functions and data transformations necessary to complete the labs can still be time consuming. Little things like an ambiguous instruction or uncaught error you made earlier in the assignment can result in bugs that take a while to squash. Despite occasional frustrations, the labs do a good job interspersing instruction with practical, hands-on learning.

Scalable Machine Learning is a quality introduction to machine learning with Pyspark that focuses on labs over lectures. The lectures could be better and some of the instructions and error checks in the labs could be more comprehensive, but this is a great course for those looking to learn by doing.

I give Scalable Machine Learning 4 out of 5 stars: Very Good.
Martin Strandbygaard

Overall a good course, that is worthwhile spending the time on, if you want to get a basic introduction to solving machine learning problems using Apache Spark. As with the precursor, CS100.1x, the lecture videos and quizzes are pretty light on act…

Overall a good course, that is worthwhile spending the time on, if you want to get a basic introduction to solving machine learning problems using Apache Spark.

As with the precursor, CS100.1x, the lecture videos and quizzes are pretty light on actual content and nothing spectacular. However, as with the precursor I found the assignments really well structured, interesting, and informative. They use IPython notebook which I found to be a really awesome format for this kind of course and assignments.

The course is not heavy on the mathematics of machine learning algorithms, and it's introductions to the used algorithms is very basic. For this, something like Machine Learning on Coursera is a much better course.

What this course does is give you a good introduction to solving some actual problems using a selection of machine learning algorithms with Apache Spark.

I found some of the assignments for this course to be easier than some of the later assignments for the introduction course CS100.1x

I had a hard time deciding if this course should get 3 or 4 stars. But ended up with 3 stars. The assignments definitely rate 4 stars, and I think that is the most important aspect of the course. I think the lecture videos only rate 3 stars. For comparison, watch the lectures from Machine Learning on Coursera which I believe rate 5 stars.
Anonymous

The machine learning algorithms are explained in reasonably granular level, and easy to follow. The labs are the highlight. I learnt a lot from doing. Thanks for putting this course together.
Gaurabh

Very well explained machine learning using Spark from scratch. Therefore a good introductory course. Not too many details covered, probably due to time limitation. Hope they make a sequel.
Anonymous
Vlad Podgurschi
Tabish Sada
Lace Lofranco
C M Chan
Prakhar Srivastav @prakhar
Dmitry Nikulin
Liang Lu
Igor Subbotin
Sergiy Matusevych
V M
Shuang Wu
Peter Mosoni
Rogier Werschkull
Maurits Doorn
Sauro Grandi