This course covers part of the data preparation phase of the machine learning (ML) lifecycle. In this course, you will learn about data transformation. This course covers various transformation concepts and techniques, such as data cleaning, encoding, and feature engineering. You will discover how to use Amazon Web Services (AWS) services such as Amazon SageMaker Feature Store, Amazon SageMaker Data Wrangler, and AWS Glue to transform your data.
- Course level: 300
- Duration: 60 minutes
Activities
- Online materials
- A demonstration
- Knowledge check questions
- A course assessment
Course objectives
- Explain the value of data cleaning and transformation.
- Describe how to process incorrect or duplicated data.
- Describe how to detect and treat outliers.
- Describe how to process missing values.
- Describe fundamental encoding techniques.
- Identify feature engineering use cases.
- Describe fundamental concepts, benefits, and techniques of feature engineering.
- Describe fundamental feature selection techniques.
- Describe AWS services for validating and labeling data.
- Identify AWS tools and services for visualizing and transforming data.
- Describe how to ingest data and manage features by using SageMaker Feature Store.
- Describe how to ingest and transform data by using Amazon SageMaker Data Wrangler.
- Describe how to transform data by using AWS Glue.
- Identify AWS tools and services for transforming streaming data.
- Describe how to transform streaming data using AWS Lambda and Apache Spark on Amazon EMR.
Intended audience
- Cloud architects
- Machine learning engineers
Recommended Skills
- At least 1 year of experience using Amazon SageMaker and other AWS services for ML engineering.
- At least 1 year of experience in a related role such as backend software developer, DevOps developer, data engineer, or data scientist.
- A fundamental understanding of programming languages such as Python.
- Preceding courses in the AWS ML Engineer Associate Learning Plan.
Course outline
- Section 1: Introduction
- Lesson 1: How to Use This Course
- Lesson 2: Course Overview
- Lesson 3: Fundamentals of Data Transformation
- Section 2: Data Cleaning Techniques
- Lesson 4: Incorrect and Duplicated Data
- Lesson 5: Data Outliers
- Lesson 6: Incomplete or Missing Data
- Section 3: Categorical Encoding Techniques
- Lesson 7: Categorical Encoding
- Lesson 8: Encoding Techniques
- Section 4: Feature Engineering
- Lesson 9: Feature Engineering Concepts
- Lesson 10: Numeric Feature Engineering
- Lesson 11: Text Feature Engineering
- Lesson 12: Feature Selection Techniques
- Section 5: AWS Tools and Services for Data Transformation
- Lesson 13: Data Labeling with AWS
- Lesson 14: Data Ingestion with AWS
- Lesson 15: Data Transformation with AWS
- Lesson 16: Transforming Data by Using AWS Glue
- Section 6: Conclusion
- Lesson 17: Course Summary
- Lesson 18: Assessment
- Lesson 19: Contact Us