Machine Learning: Classification

University of Washington via Coursera

Go to class Write review

Details

Go to class

Provider

Coursera
Pricing

Free Online Course (Audit)
Languages

English
Certificate

Paid Certificate Available
Duration & workload

21 hours 25 minutes
Sessions

On-Demand
Subtitles

Arabic, French, Portuguese, Italian, German, Russian, English, Spanish, Korean, Thai, Indonesian, Kazakh, Hindi, Swedish, Greek, Chinese, Ukrainian, Japanese, Polish, Dutch, Turkish, Hungarian, Bengali, Pashto, Urdu, Azerbaijani, Farsi

Found in

Part of

Machine Learning

1.0

Overview

Case Studies: Analyzing Sentiment & Loan Default Prediction In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank. These tasks are an examples of classification, one of the most widely used areas of machine learning, with a broad array of applications, including ad targeting, spam detection, medical diagnosis and image classification. In this course, you will create classifiers that provide state-of-the-art performance on a variety of tasks. You will become familiar with the most successful techniques, which are most widely used in practice, including logistic regression, decision trees and boosting. In addition, you will be able to design and implement the underlying algorithms that can learn these models at scale, using stochastic gradient ascent. You will implement these technique on real-world, large-scale machine learning tasks. You will also address significant tasks you will face in real-world applications of ML, including handling missing data and measuring precision and recall to evaluate a classifier. This course is hands-on, action-packed, and full of visualizations and illustrations of how these techniques will behave on real data. We've also included optional content in every module, covering advanced topics for those who want to go even deeper! Learning Objectives: By the end of this course, you will be able to: -Describe the input and output of a classification model. -Tackle both binary and multiclass classification problems. -Implement a logistic regression model for large-scale classification. -Create a non-linear model using decision trees. -Improve the performance of any model using boosting. -Scale your methods with stochastic gradient ascent. -Describe the underlying decision boundaries. -Build a classification model to predict sentiment in a product review dataset. -Analyze financial data to predict loan defaults. -Use techniques for handling missing data. -Evaluate your models using precision-recall metrics. -Implement these techniques in Python (or in the language of your choice, though Python is highly recommended).

Syllabus

Welcome!

Classification is one of the most widely used techniques in machine learning, with a broad array of applications, including sentiment analysis, ad targeting, spam detection, risk assessment, medical diagnosis and image classification. The core goal of classification is to predict a category or class y from some inputs x. Through this course, you will become familiar with the fundamental models and algorithms used in classification, as well as a number of core machine learning concepts. Rather than covering all aspects of classification, you will focus on a few core techniques, which are widely used in the real-world to get state-of-the-art performance. By following our hands-on approach, you will implement your own algorithms on multiple real-world tasks, and deeply grasp the core techniques needed to be successful with these approaches in practice. This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have.

Linear Classifiers & Logistic Regression

Linear classifiers are amongst the most practical classification methods. For example, in our sentiment analysis case-study, a linear classifier associates a coefficient with the counts of each word in the sentence. In this module, you will become proficient in this type of representation. You will focus on a particularly useful type of linear classifier called logistic regression, which, in addition to allowing you to predict a class, provides a probability associated with the prediction. These probabilities are extremely useful, since they provide a degree of confidence in the predictions. In this module, you will also be able to construct features from categorical inputs, and to tackle classification problems with more than two class (multiclass problems). You will examine the results of these techniques on a real-world product sentiment analysis task.

Learning Linear Classifiers

Once familiar with linear classifiers and logistic regression, you can now dive in and write your first learning algorithm for classification. In particular, you will use gradient ascent to learn the coefficients of your classifier from data. You first will need to define the quality metric for these tasks using an approach called maximum likelihood estimation (MLE). You will also become familiar with a simple technique for selecting the step size for gradient ascent. An optional, advanced part of this module will cover the derivation of the gradient for logistic regression. You will implement your own learning algorithm for logistic regression from scratch, and use it to learn a sentiment analysis classifier.

Overfitting & Regularization in Logistic Regression

As we saw in the regression course, overfitting is perhaps the most significant challenge you will face as you apply machine learning approaches in practice. This challenge can be particularly significant for logistic regression, as you will discover in this module, since we not only risk getting an overly complex decision boundary, but your classifier can also become overly confident about the probabilities it predicts. In this module, you will investigate overfitting in classification in significant detail, and obtain broad practical insights from some interesting visualizations of the classifiers' outputs. You will then add a regularization term to your optimization to mitigate overfitting. You will investigate both L2 regularization to penalize large coefficient values, and L1 regularization to obtain additional sparsity in the coefficients. Finally, you will modify your gradient ascent algorithm to learn regularized logistic regression classifiers. You will implement your own regularized logistic regression classifier from scratch, and investigate the impact of the L2 penalty on real-world sentiment analysis data.

Decision Trees

Along with linear classifiers, decision trees are amongst the most widely used classification techniques in the real world. This method is extremely intuitive, simple to implement and provides interpretable predictions. In this module, you will become familiar with the core decision trees representation. You will then design a simple, recursive greedy algorithm to learn decision trees from data. Finally, you will extend this approach to deal with continuous inputs, a fundamental requirement for practical problems. In this module, you will investigate a brand new case-study in the financial sector: predicting the risk associated with a bank loan. You will implement your own decision tree learning algorithm on real loan data.

Preventing Overfitting in Decision Trees

Out of all machine learning techniques, decision trees are amongst the most prone to overfitting. No practical implementation is possible without including approaches that mitigate this challenge. In this module, through various visualizations and investigations, you will investigate why decision trees suffer from significant overfitting problems. Using the principle of Occam's razor, you will mitigate overfitting by learning simpler trees. At first, you will design algorithms that stop the learning process before the decision trees become overly complex. In an optional segment, you will design a very practical approach that learns an overly-complex tree, and then simplifies it with pruning. Your implementation will investigate the effect of these techniques on mitigating overfitting on our real-world loan data set.

Handling Missing Data

Real-world machine learning problems are fraught with missing data. That is, very often, some of the inputs are not observed for all data points. This challenge is very significant, happens in most cases, and needs to be addressed carefully to obtain great performance. And, this issue is rarely discussed in machine learning courses. In this module, you will tackle the missing data challenge head on. You will start with the two most basic techniques to convert a dataset with missing data into a clean dataset, namely skipping missing values and inputing missing values. In an advanced section, you will also design a modification of the decision tree learning algorithm that builds decisions about missing data right into the model. You will also explore these techniques in your real-data implementation.

Boosting

One of the most exciting theoretical questions that have been asked about machine learning is whether simple classifiers can be combined into a highly accurate ensemble. This question lead to the developing of boosting, one of the most important and practical techniques in machine learning today. This simple approach can boost the accuracy of any classifier, and is widely used in practice, e.g., it's used by more than half of the teams who win the Kaggle machine learning competitions. In this module, you will first define the ensemble classifier, where multiple models vote on the best prediction. You will then explore a boosting algorithm called AdaBoost, which provides a great approach for boosting classifiers. Through visualizations, you will become familiar with many of the practical aspects of this techniques. You will create your very own implementation of AdaBoost, from scratch, and use it to boost the performance of your loan risk predictor on real data.

Precision-Recall

In many real-world settings, accuracy or error are not the best quality metrics for classification. You will explore a case-study that significantly highlights this issue: using sentiment analysis to display positive reviews on a restaurant website. Instead of accuracy, you will define two metrics: precision and recall, which are widely used in real-world applications to measure the quality of classifiers. You will explore how the probabilities output by your classifier can be used to trade-off precision with recall, and dive into this spectrum, using precision-recall curves. In your hands-on implementation, you will compute these metrics with your learned classifier on real-world sentiment analysis data.

Scaling to Huge Datasets & Online Learning

With the advent of the internet, the growth of social media, and the embedding of sensors in the world, the magnitudes of data that our machine learning algorithms must handle have grown tremendously over the last decade. This effect is sometimes called "Big Data". Thus, our learning algorithms must scale to bigger and bigger datasets. In this module, you will develop a small modification of gradient ascent called stochastic gradient, which provides significant speedups in the running time of our algorithms. This simple change can drastically improve scaling, but makes the algorithm less stable and harder to use in practice. In this module, you will investigate the practical techniques needed to make stochastic gradient viable, and to thus to obtain learning algorithms that scale to huge datasets. You will also address a new kind of machine learning problem, online learning, where the data streams in over time, and we must learn the coefficients as the data arrives. This task can also be solved with stochastic gradient. You will implement your very own stochastic gradient ascent algorithm for logistic regression from scratch, and evaluate it on sentiment analysis data.

Taught by

Carlos Guestrin and Emily Fox

Reviews

4.8 rating, based on 9 Class Central reviews

4.7 rating at Coursera based on 3732 ratings

Start your review of Machine Learning: Classification

Gregory J Hamel ( Life Is Study) @greg

Machine Learning: Classification is the third course in the 6-part machine learning specialization offered by the University of Washington on the Coursera MOOC platform. The first two weeks of the 7-week course discuss classification in general, lo…

Machine Learning: Classification is the third course in the 6-part machine learning specialization offered by the University of Washington on the Coursera MOOC platform. The first two weeks of the 7-week course discuss classification in general, logistic regression and controlling overfitting with regularization. Weeks 3 and 4 cover decision trees, methods to control overfitting in tree models and handling missing data. Week 5 discusses boosting as an ensemble learning method in the context of decision trees. Weeks 6 and 7 cover precision and recall as alternatives to accuracy for assessing model performance and stochastic gradient ascent to make models scalable.

The course builds on the concepts covered in Machine Learning: Regression, so it is highly recommended that you take it first. Assignments use GraphLab, a Python package that requires the 64-bit version of Python 2.7. You can technically complete the course with whatever language and tools you like, but using Python and GraphLab will make your life much easier because the assignments are designed around it. Like the previous course, basic knowledge of Python, derivatives and matrices is recommended, but course doesn't get too deep into math. Grading is based on weekly quizzes and programming assignments.

Machine Learning: Classification follows in the footsteps of the regression course, offering a good mix of high quality instructional videos and illustrative programming assignments. Carlos Guestrin takes the reigns in the course (Emily Fox, the professor for the regression course, does not make an appearance) but the presentation format and style are mostly unchanged: videos break topics down into well-organized and digestible 1 to 7 minute chunks. The slides are crisp and generally uncluttered. Some of the most complicated sections are optional, so you can skip them without it affecting your performance on the programming assignments and quizzes.

The programming assignments are provided in Jupyter notebooks--interactive text and code documents that run in your browser. They do a good job illustrating the concepts and walking you through the process of implementing machine learning algorithms. Although the course claims that you'll be implementing algorithms yourself from scratch, they provide a ton setup, support and skeleton code: you don't need to define a single function yourself. Instead, you follow along with instructions and fill in key pieces of code in the bodies of certain pre-defined functions to get things working. Essentially every line of code you need to write has a comment giving you the gist of what you are supposed to do. Some may not appreciate this degree of hand-holding, but it keeps the assignments moving along steadily and puts the focus on learning and understanding concepts rather than coding details and debugging.

My only major gripe with this course is with some of the decisions concerning which topics to cover. The course mentions random forest models briefly at the end of the section on boosting, but the topic warrants a little more detail. A single 5-8 minute video would have been enough. The course does not mention support vector machines at all. The professor stated in the forums that he may release some videos on SVMs in the future but they were not included at launch since they are more complicated than other models and do not scale well to large datasets. The section on decision trees only discusses missclassification error as a metric for splitting, failing to mention information gain or gini impurity, which are often preferred in practice. Similarly, the boosting section focuses on AdaBoost, while stochastic gradient boosting and xgboost in particular are often more successful in practice. The final week's title "scaling to huge data sets and online learning" is a little misleading because it only really covers stochastic gradient ascent and mini-batch gradient ascent.

Machine Learning: Classification is a great first course for learning about classification that benefits from good organization and illustrative programming assignments. The course, does, however eschew some important topics in favor of simplicity; including a few more optional videos covering these topics would give the course the breadth and depth advanced learners desire without harming its accessibility.

I give Machine Learning: Classification 4.5 out of 5 stars: Great.
Zelphir Kaltstahl

This course can really teach you about classification approaches and problems you might encounter. The quiz felt like challenges, but they were all doable and made me feel good about completing them. Not too easy and not too difficult. I spent a lot…

This course can really teach you about classification approaches and problems you might encounter. The quiz felt like challenges, but they were all doable and made me feel good about completing them. Not too easy and not too difficult. I spent a lot of time each week of the course, but then again I am a perfectionist and the amount of time I spend on a course also contains taking my own notes, making them digitally online available, doing revision to be sure I understood everything, looking up and learning about formulas and the why behind them, etc.
If you take this course seriously, don't take any shortcuts and take good notes, you will learn solid basics of classification. The instructors are pros, so they know what they are talking about too. It uses Jupyter Notebooks, which are a great technology in themselves, but also they're prepared well for you to solve the homework / assignments. One thing that might be possible to improve upon is, that they could offer alternative notebooks for people who don't wish to use GraphLab Create. However, it is unlikely to happen, as one of the instructors is the founder of Turi, which sells GraphLab Create. However they're at least fair and give alternative instructions for you when you don't want to use GLC and you can put those into your notebooks to get things running.
Another thing they could improve upon is their usage of Python 2.x. It's 2017 people. There is no library except GraphLab Create in this course, which is dependent on Python 2 rather than Python 3. They should have created the course for Python 3.4 or later in the first place. So I changed all the notebooks to Python 3 and got through the course anyway.
I offered them the Python 3 notebooks, but didn't get response, so that's that.

This is one of the few courses I actually completed, so in my book it is one of the best I've seen so far. The course design and functionality is not super interactive, but the lectures are well thought through and the content good.
Anonymous

I am a programmer of user interfaces for web projects in the front end, and I love the field of computer machine learning and i loved python and java
Jason Michael Cherry

This continues UWash's outstanding Machine Learning series of classes, and is equally as impressive, if not moreso, then the Regression class it follows. I'm super-excited for the next class!
Daniel Rosquete

The course is great, as the others in this specializations, they really make it simple, but they are not going deeper in the algorithms, however it is really great!
Pk32
Colin Khein
Farmy Setiawan Radjatadoe
Abhilash Vj