Overview

This is the third course in the IBM AI Enterprise Workflow Certification specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones. Course 3 introduces you to the next stage of the workflow for our hypothetical media company. In this stage of work you will learn best practices for feature engineering, handling class imbalances and detecting bias in the data. Class imbalances can seriously affect the validity of your machine learning models, and the mitigation of bias in data is essential to reducing the risk associated with biased models. These topics will be followed by sections on best practices for dimension reduction, outlier detection, and unsupervised learning techniques for finding patterns in your data. The case studies will focus on topic modeling and data visualization. By the end of this course you will be able to: 1. Employ the tools that help address class and class imbalance issues 2. Explain the ethical considerations regarding bias in data 3. Employ ai Fairness 360 open source libraries to detect bias in models 4. Employ dimension reduction techniques for both EDA and transformations stages 5. Describe topic modeling techniques in natural language processing 6. Use topic modeling and visualization to explore text data 7. Employ outlier handling best practices in high dimension data 8. Employ outlier detection algorithms as a quality assurance tool and a modeling tool 9. Employ unsupervised learning techniques using pipelines as part of the AI workflow 10. Employ basic clustering algorithms Who should take this course? This course targets existing data science practitioners that have expertise building machine learning models, who want to deepen their skills on building and deploying AI in large enterprises. If you are an aspiring Data Scientist, this course is NOT for you as you need real world expertise to benefit from the content of these courses. What skills should you have? It is assumed that you have completed Courses 1 and 2 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.

Syllabus

Data transforms and feature engineering

This module will introduce you to skills required for effective feature engineering in today's business enterprises. The skills are presented as a series of best practices representing years of practical experience.

Pattern recognition and data mining best practices

This module will continue the discussion of skill related to feature engineering for practicing data scientists, with a focus on outliers and the use of unsupervised learning techniques for finding patterns.