In this course, you will explore the importance of monitoring and optimizing, including which key performance indicator (KPI) metrics you should focus on. This course will discuss the tools available for infrastructure monitoring and optimization. These tools include Amazon CloudWatch, AWS X-Ray, Amazon QuickSight, AWS CloudTrail, Amazon EventBridge, AWS Compute Optimizer, Amazon SageMaker Inference Recommender, and more. In the context of ML solutions, this course will also explore AWS cost analysis tools such as AWS Billing and Cost Management, AWS Budgets, AWS Cost Explorer, and AWS Trusted Advisor.
- Course level: Advanced
- Duration: 2 hours and 30 minutes
Activities
- Online materials
- Exercises
- Knowledge check questions
Course objectives
- Describe the importance of monitoring ML infrastructure and key performance metrics.
- Configure and use CloudWatch Logs and alarms to troubleshoot and analyze resources.
- Identify monitoring and observability tools used to troubleshoot latency and performance issues.
- Set up dashboards to monitor performance metrics for your machine learning infrastructure.
- Describe how to use CloudTrail to log, monitor, and retain activities related to API calls.
- Demonstrate how to rightsize instance families with SageMaker Inference Recommender.
- Demonstrate how to rightsize instance families with Compute Optimizer.
- Identify and troubleshoot capacity concerns for cost and performance.
- Identify and describe capabilities of AWS cost analysis tools.
- Describe the benefits and options for Machine Learning Savings Plans for Amazon SageMaker.
- Identify additional resources and best practices for optimizing costs.
Intended audience
- Cloud architects
- Machine learning engineers
Recommended Skills
- Completed at least 1 year of experience using SageMaker and other AWS services for ML engineering
- Completed at least 1 year of experience in a related role, such as backend software developer, DevOps developer, data engineer, or data scientist
- A fundamental understanding of programming languages, such as Python
- Completed preceding courses in the AWS ML Engineer Associate Learning Plan
Course outline
- Section 1: Introduction
- Lesson 1: How to Use This Course
- Lesson 2: Course Overview
- Section 2: Monitor Infrastructure
- Lesson 3: Importance of Monitoring ML Infrastructure
- Lesson 4: Monitoring Performance Metrics
- Lesson 5: Monitoring and Observability
- Lesson 6: Monitoring Tools for Performance and Latency
- Lesson 7: Observability and Auditing Your ML Solution
- Lesson 8: Setting Up Dashboards
- Section 3: Optimize Infrastructure
- Lesson 9: Rightsizing Compute Infrastructure for ML Solutions
- Lesson 10: Demo: Amazon SageMaker Inference Recommender
- Section 4: Optimize Costs
- Lesson 11: Reducing Monitoring Costs
- Lesson 12: Balancing Capacity, Cost, and Performance
- Lesson 13: Using AWS Cost Management Tools with ML Solutions
- Lesson 14: Purchasing Option to Optimize ML Infrastructure Costs
- Section 5: Conclusion
- Lesson 15: Course Summary
- Lesson 16: Assessment
- Lesson 17: Contact Us
Keywords
- Gen AI
- Generative AI