Classification Trees in Python, From Start To Finish

Overview

In this 1-hour long project-based course, you will learn how to build Classification Trees in Python, using a real world dataset that has missing data and categorical data that must be transformed with One-Hot Encoding. We then use Cost Complexity Pruning and Cross Validation to build a tree that is not overfit to the Training Dataset. This course runs on Coursera's hands-on project platform called Rhyme. On Rhyme, you do projects in a hands-on manner in your browser. You will get instant access to pre-configured cloud desktops containing all of the software and data you need for the project. Everything is already set up directly in your Internet browser so you can just focus on learning. For this project, you’ll get instant access to a cloud desktop with (e.g. Python, Jupyter, and Tensorflow) pre-installed. Prerequisites: In order to be successful in this project, you should be familiar with Python and the theory behind Decision Trees, Cost Complexity Pruning, Cross Validation and Confusion Matrices. Notes: - This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.

Syllabus

Classification Trees in Python, from Start To Finish

In this lesson we will use scikit-learn and Cost Complexity Pruning to build a Classification Tree, which uses continuous and categorical data from the UCI Machine Learning Repository to predict whether or not a patient has heart disease.