Exploratory Data Analysis for Machine Learning

Overview

This first course in the IBM Machine Learning Professional Certificate introduces you to Machine Learning and the content of the professional certificate. In this course you will realize the importance of good, quality data. You will learn common techniques to retrieve your data, clean it, apply feature engineering, and have it ready for preliminary analysis and hypothesis testing. By the end of this course you should be able to: Retrieve data from multiple data sources: SQL, NoSQL databases, APIs, Cloud Describe and use common feature selection and feature engineering techniques Handle categorical and ordinal features, as well as missing values Use a variety of techniques for detecting and dealing with outliers Articulate why feature scaling is important and use a variety of scaling techniques Who should take this course? This course targets aspiring data scientists interested in acquiring hands-on experience with Machine Learning and Artificial Intelligence in a business setting. What skills should you have? To make the most out of this course, you should have familiarity with programming on a Python development environment, as well as fundamental understanding of Calculus, Linear Algebra, Probability, and Statistics.

Syllabus

A Brief History of Modern AI and its Applications

Artificial Intelligence is not new, but it is new in a sense that it is easier than ever to get started using Machine Learning in business settings. In this module, we will go over a quick introduction to AI and Machine Learning and we will visit a brief history of the modern AI. We will also explore some of the current applications of AI and Machine Learning for you, to think about how you want to leverage them in your day to day business practice or personal projects.

Retrieving and Cleaning Data

Good data is the fuel that powers Machine Learning and Artificial Intelligence. In this module, you will learn how to retrieve data from different sources, how to clean it to ensure its quality.

Exploratory Data Analysis and Feature Engineering

In this module you will learn how to conduct exploratory analysis to visually confirm it is ready for machine learning modeling by feature engineering and transformations.

Inferential Statistics and Hypothesis Testing

Inferential statistics and hypothesis testing are two types of data analysis often overlooked at early stages of analyzing your data. They can give you quick insights about the quality of your data. They also help you confirm business intuition and help you prescribe what to analyze next using Machine Learning. This module looks at useful definitions and simple examples that will help you get started creating hypothesis around your business problem and how to test them.

(Optional) HONORS Project

In this optional HONORS project you will apply your skills and knowledge learned throughout the course. You can select a dataset from the ones used in this Course or any other dataset of interest and apply all of the demonstrated techniques including, Data Cleaning, Feature Engineering, Exploratory Data Visualization, and Hypothesis Testing.