Data Engineering on AWS - A Streaming Data Pipeline Solution

Amazon Web Services and Amazon via AWS Skill Builder

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

In this course, you will learn to build a streaming data analytics solutions using AWS services, including Amazon Kinesis, Amazon Data Firehose, and Amazon Managed Streaming for Apache Kafka (Amazon MSK). Kinesis is a massively scalable and durable real-time data streaming service. Amazon MSK offers a secure, fully managed, and highly available Apache Kafka service.

You will learn how Kinesis and Amazon MSK integrate with AWS services such as AWS Glue and AWS Lambda. The course addresses the streaming data ingestion, stream storage, and stream processing components of the data analytics pipeline. You will also learn to apply security, performance, and cost management best practices to the operation of Kinesis and Amazon MSK.

The course is divided into different modules. The learning modules introduce new concepts and the AWS services you can use to build your solution. Lab modules are in-depth, hands-on activities with step-by-step instructions for you to apply what you’ve learned.

Activities

Interactive content, videos, knowledge checks, assessments, and hands-on labs

Course objectives

Recognize an analytics customer challenge and describe the appropriate AWS solution for solving it featuring a streaming data architecture.
Describe data sources suitable for streaming applications and how that data is ingested.
Identify short-term and long-term storage services for streaming data.
Describe how to design and implement real-time data processing solutions.
Recognize how to serve streaming data for consumption by end users.
Describe how to optimize a streaming data pipeline using Amazon Kinesis, Amazon MSK, and Amazon Redshift.
Identify best practices for securing a streaming data pipeline.

Intended audience

Data engineer
Data analyst
Data architect
Business intelligence engineer

Recommended skills

2-3 years of experience in data engineering
1–2 years of hands-on experience with AWS services
Completed AWS Cloud Practitioner Essentials or equivalent
Completed Fundamentals of Analytics on AWS Part 1 and 2
Completed Data Engineering on AWS – Foundations

Course outline

Module 1: Building a Streaming Data Pipeline Solution

This course shows how to identify, select, and configure the appropriate AWS services for building a streaming data pipeline solution to meet a fictitious customer's business goals.

Introduction
Ingesting Data from Stream Sources
Storing Streaming Data
Processing Data
Analyzing Data
Final Assessment
Conclusion

Module 2: Streaming Analytics with Amazon Managed Service for Apache Flink (Lab)

This lab is a step-by-step, hands-on activity to build a stream processing pipeline by ingesting clickstream data and enriching the clickstream data with catalog data stored in Amazon Simple Storage Service (Amazon S3). You perform analysis on the enriched data to identify the sales per category in real time and visualize the output.

Lab overview
Task 1: Setting up Zeppelin notebook environment
Task 2: Connect to the Amazon EC2 producer and start the clickstream generator
Task 3: Import the Zeppelin notebook
Task 4: Analytics development in Managed Apache Flink Studio with Zeppelin notebook
Task 5: Understanding in-memory table creation in AWS Glue Data Catalog
Conclusion

Module 3: Optimizing and Securing a Streaming Data Pipeline Solution

This course covers how to configure a fictitious customer's streaming data pipeline solution to increase efficiency, control costs, secure and protect the data, and govern the infrastructure.

Optimization
Security and Governance
Final Assessment
Conclusion

Module 4: Introduction to Access Control with Amazon Managed Streaming for Apache Kafka (Lab)

This lab is a step-by-step, hands-on activity to learn about the IAM method to authenticate and authorize users of an MSK cluster.

Lab overview
Task 1: Inspecting the MSK cluster
Task 2: Publish to and consume from an IAM authenticated MSK cluster
Conclusion

Reviews

Start your review of Data Engineering on AWS - A Streaming Data Pipeline Solution

Go to class

Tags

Digital Classroom - Building Streaming Data Analytics Solutions on AWS (Portuguese) (Na)

Digital Classroom - Building Streaming Data Analytics Solutions on AWS

Digital Classroom - Building Streaming Data Analytics Solutions on AWS (Simplified Chinese)

Data Engineering on AWS - A Batch Data Pipeline Solution (Includes Labs)

Data Engineering on AWS - A Data Warehouse Solution

Conceptualizing the Processing Model for the AWS Kinesis Data Analytics Service

Never Stop Learning.