Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udemy

Mastering AWS Elastic Map Reduce (EMR) for Data Engineers

via Udemy

Overview

Build Pyspark and Spark SQL Applications on AWS EMR, Orchestrate using Step Functions, Manage EMR using Boto3 and more

What you'll learn:
  • Creating Clusters using AWS Elastic Map Reduce Web Console
  • Setup Remote Application Development using AWS Elastic Map Reduce (EMR) and Visual Studio Code
  • Develop and Validate Simple Spark Application using Visual Studio Code and AWS Elastic Map Reduce (EMR)
  • Deploy Spark Application as Step to AWS Elastic Map Reduce (EMR)
  • Manage AWS Elastic Map Reduce (EMR) based Pipelines using Boto3 and Python
  • Build End to End AWS Elastic Map Reduce (EMR) based Pipelines using AWS Step Functions
  • Develop Applications using Spark SQL on AWS EMR Cluster
  • Build State Machine or Pipeline using AWS Step Functions using Spark SQL Script on AWS EMR Cluster
  • Understand how to pass parameters to Spark SQL Scripts deployed on EMR

AWS Elastic Map Reduce (EMR)is one of the key AWS Services used in building large-scale data processing leveraging Big Data Technologies such as Apache Hadoop, Apache Spark, Hive, etc. As part of this course, you will end up learning AWS Elastic Map Reduce (EMR)by building end-to-end data pipelines leveraging Apache Spark and AWS Step Functions.

Here is the detailed outline of the course.

  • First, you will learn how to Get Started with AWSElastic Map Reduce (EMR) by understanding how to use AWSWeb Console to create and manage EMR Clusters. You will also learn about all the key features of Web Console and also how to connect to the master node of the cluster and validate all the important CLI interfaces such as spark-shell, pyspark, hive, etc as well as hdfs and aws CLI commands.

  • Once you understand how to get started with AWS EMR, you will go through the details related to Setting up Development Cluster using AWS EMR. There are quite a few advantages to using AWS EMR Clusters for development purposes and most enterprises do so.

  • After setting up a development cluster using AWS EMR, you will go through the Development Life Cycle of Spark Applications using AWS EMR Development Cluster. You will be using Visual Studio Code Remote Development on top of the AWS EMR Development Cluster to go through the details.

  • Once the development is done, you will go through the details related to Deploying Spark Application on AWS EMR Cluster. You will build the zip file and understand how to run using CLI in both clients as well as cluster deployment modes. You will also understand how you can deploy the spark application as a step on AWSEMR Clusters. You will also understand the details related to troubleshooting the issues related to Spark Applications by going through relevant logs.

  • Typically we run Spark Applications programmatically. After going through the details related to deploying spark applications on AWS EMR Clusters, you will be learning how to Manage AWS EMR Clusters using Python Boto3. You will not only learn how to create clusters programmatically but also how to deploy Spark Applications as Steps programmatically using Python Boto3.

  • End to End Data Pipelines using AWSEMR is built using AWS Step Functions. Once you understand how to manage EMR Clusters using Python Boto3 and also deploy Spark Applications on EMR Clusters using the same, it is important to learn how to Build EMR-based Workflows or Pipelines using AWS Step Functions. You will be learning how to create the cluster, deploy Spark Application as Step on to the cluster, and then terminate the cluster as part of a basic pipeline or State Machine using AWS Step Functions.

  • You will also learn how to perform validations as part of State Machines by Enhancing AWS EMR-based State Machine or Pipeline. You will check if the files specified already exist as part of the validations.

  • We can also build Data Processing Applications or Pipelines using Spark SQL on AWS EMR. First, you will learn how to design and develop solutions using Spark SQL Script, how to validate by using appropriate commands by passing relevant runtime arguments, etc.

  • Once you understand the development process of implementing solutions using Spark SQLon AWSEMR, you will learn how to deploy Data Pipeline using AWS Step Function to deploy Spark SQL Script on EMR Cluster. You will also learn the concept of Boto3 Waiters to make sure the steps are executed in a linear fashion.

Taught by

Durga Viswanatha Raju Gadiraju, Pratik Kumar, Sathvika Dandu, Madhuri Gadiraju, Sai Varma and Phani Bhushan Bozzam

Reviews

4.4 rating at Udemy based on 318 ratings

Start your review of Mastering AWS Elastic Map Reduce (EMR) for Data Engineers

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.