This second course of the AI Product Management Specialization by Duke University's Pratt School of Engineering focuses on the practical aspects of managing machine learning projects. The course walks through the keys steps of a ML project from how to identify good opportunities for ML through data collection, model building, deployment, and monitoring and maintenance of production systems. Participants will learn about the data science process and how to apply the process to organize ML efforts, as well as the key considerations and decisions in designing ML systems.
At the conclusion of this course, you should be able to:
1) Identify opportunities to apply ML to solve problems for users
2) Apply the data science process to organize ML projects
3) Evaluate the key technology decisions to make in ML system design
4) Lead ML projects from ideation through production using best practices
Overview
Syllabus
- Identifying Opportunities for Machine Learning
- In this module we will discuss how to identify problems worth solving, how to determine whether ML is a good fit as part of the solution, and how to validate solution concepts. We will also learn why heuristics are useful in modeling projects and the advantages and disadvantages of ML relative to heuristics.
- Organizing ML Projects
- In this module we will focus on the CRISP-DM data science process and how it can be used to organize ML projects. We will begin by understanding what is unique about ML project relative to normal software projects, and then discuss approaches to manage the inherent risks of ML projects. We will also walk through the key roles on a ML project team and how to organize work.
- Data Considerations
- In this module we will explore the key data-related issues that arise in ML projects. Data is the foundation of successful machine learning, and gathering data of sufficient quantity and quality with the right set of attributes is the key to a successful project. We will discuss the key considerations in sourcing data, cleaning data, and developing and selecting a feature set to use in modeling. The module will conclude with a discussion on best practices to ensure reproducibility of your data pipeline.
- ML System Design & Technology Selection
- In this module we will discuss the key decisions to make in designing ML systems, such as cloud vs. edge and online vs. batch, and compare the benefits of each type of system. We will then discuss the primary technology decisions to make in a ML project and introduce the common tools and technologies used to build ML models.
- Model Lifecycle Management
- The final module in the course focuses on identifying and mitigating the key issues which ML models experience once they are in production. We will discuss how to set up a robust ML system monitoring capability and define a model maintenance plan to maintain high performance of a production model. We will conclude with a discussion on the importance of versioning in ML systems to facilitate continued rapid iteration even after deployment.
Taught by
Jon Reifschneider