Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
This course takes you through the complete process of data handling, starting with AWS data processing services. You’ll begin with AWS Lambda, learning how to integrate serverless functions and manage scalable data pipelines. With practical exercises, you’ll explore how AWS Glue helps automate data preparation and manage complex ETL jobs, making data lake partitioning and modification of Glue Data Catalog easy to understand. Hands-on experience with Glue Studio and DataBrew will further enhance your knowledge in preparing data for analysis.
The course also delves into processing large datasets using Amazon EMR, where you’ll work with Apache Spark, Hive, and other tools in the Hadoop ecosystem. You’ll learn to optimize data processing with EMR, partition and store data efficiently, and integrate it with AWS services like Kinesis and Redshift. Exercises in Apache Spark will show you how to analyze data streams and deliver actionable insights in real time.
Lastly, you'll focus on the analysis aspect using services like Kinesis Analytics, OpenSearch, and Athena. The course will guide you through setting up advanced analytics using Kinesis, creating real-time monitoring applications, and visualizing data using OpenSearch and QuickSight. By the end of this course, you’ll be well-equipped to build, process, and analyze data pipelines at scale using AWS’s powerful tools.
This course is ideal for data engineers, IT professionals, and data analysts aiming to leverage AWS for data processing and analysis. Some familiarity with AWS services is recommended.