Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udemy

Data Engineering using AWS Data Analytics

via Udemy

Overview

Build Data Engineering Pipelines on AWS using Data Analytics Services - Glue, EMR, Athena, Kinesis, Lambda, Redshift

What you'll learn:
  • Data Engineering leveraging Services under AWS Data Analytics
  • AWS Essentials such as s3, IAM, EC2, etc
  • Understanding AWS s3 for cloud based storage
  • Understanding details related to virtual machines on AWS known as EC2
  • Managing AWS IAM users, groups, roles and policies for RBAC (Role Based Access Control)
  • Managing Tables using AWS Glue Catalog
  • Engineering Batch Data Pipelines using AWS Glue Jobs
  • Orchestrating Batch Data Pipelines using AWS Glue Workflows
  • Running Queries using AWS Athena - Server less query engine service
  • Using AWS Elastic Map Reduce (EMR) Clusters for building Data Pipelines
  • Using AWS Elastic Map Reduce (EMR) Clusters for reports and dashboards
  • Data Ingestion using AWS Lambda Functions
  • Scheduling using AWS Events Bridge
  • Engineering Streaming Pipelines using AWS Kinesis
  • Streaming Web Server logs using AWS Kinesis Firehose
  • Overview of data processing using AWS Athena
  • Running AWS Athena queries or commands using CLI
  • Running AWS Athena queries using Python boto3
  • Creating AWS Redshift Cluster, Create tables and perform CRUD Operations
  • Copy data from s3 to AWS Redshift Tables
  • Understanding Distribution Styles and creating tables using Distkeys
  • Running queries on external RDBMS Tables using AWS Redshift Federated Queries
  • Running queries on Glue or Athena Catalog tables using AWS Redshift Spectrum

Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lakes or Data Warehouses and then from Data Lakes or Data Warehouses to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using AWS Data Analytics Stack. It includes services such as Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, EMR, Kinesis, and many more.

Here are the high-level steps which you will follow as part of the course.

  • Setup Development Environment

  • Getting Started with AWS

  • Storage - All about AWS s3 (Simple Storage Service)

  • User Level Security - Managing Users, Roles, and Policies using IAM

  • Infrastructure - AWS EC2 (Elastic Cloud Compute)

  • Data Ingestion using AWS Lambda Functions

  • Overview of AWSGlue Components

  • Setup Spark History Server for AWS Glue Jobs

  • Deep Dive into AWS Glue Catalog

  • Exploring AWS Glue Job APIs

  • AWS Glue Job Bookmarks

  • Development Life Cycle of Pyspark

  • Getting Started with AWS EMR

  • Deploying Spark Applications using AWS EMR

  • Streaming Pipeline using AWS Kinesis

  • Consuming Data from AWS s3 using boto3 ingested using AWS Kinesis

  • Populating GitHub Data to AWS Dynamodb

  • Overview of Amazon AWS Athena

  • Amazon AWS Athena using AWS CLI

  • Amazon AWS Athena using Python boto3

  • Getting Started with Amazon AWS Redshift

  • Copy Data from AWS s3 into AWS Redshift Tables

  • Develop Applications using AWS Redshift Cluster

  • AWS Redshift Tables with Distkeys and Sortkeys

  • AWS Redshift Federated Queries and Spectrum

Here are the details about what you will be learning as part of this course. We will cover most of the commonly used services with hands-on practice which are available under AWS Data Analytics.

Getting Started with AWS

As part of this section, you will be going through the details related to getting started with AWS.

  • Introduction - AWS Getting Started

  • Create s3 Bucket

  • Create AWS IAM Group and AWS IAMUser to have required access on s3 Bucket and other services

  • Overview of AWS IAMRoles

  • Create and Attach Custom AWS IAMPolicy to both AWS IAMGroups as well as Users

  • Configure and Validate AWS CLI to access AWS Services using AWS CLICommands

Storage - All about AWS s3 (Simple Storage Service)

AWS s3 is one of the most prominent fully managed AWS services. All IT professionals who would like to work on AWS should be familiar with it. We will get into quite a few common features related to AWS s3 in this section.

  • Getting Started with AWS S3

  • Setup Data Set locally to upload to AWS s3

  • Adding AWS S3 Buckets and Managing Objects (files and folders)in AWS s3 buckets

  • Version Control for AWS S3 Buckets

  • Cross-Region Replication for AWS S3 Buckets

  • Overview of AWS S3 Storage Classes

  • Overview of AWS S3 Glacier

  • Managing AWS S3 using AWS CLI Commands

  • Managing Objects in AWS S3 using CLI - Lab

User Level Security - Managing Users, Roles, and Policies using IAM

Once you start working on AWS, you need to understand the permissions you have as a non-admin user. As part of this section, you will understand the details related to AWS IAMusers, groups, roles as well as policies.

  • Creating AWS IAM Users

  • Logging into AWS Management Console using AWS IAM User

  • Validate Programmatic Access to AWS IAM User

  • AWS IAM Identity-based Policies

  • Managing AWS IAM Groups

  • Managing AWS IAM Roles

  • Overview of Custom AWS IAM Policies

  • Managing AWS IAM users, groups, roles as well as policies using AWS CLI Commands

Infrastructure - AWS EC2 (Elastic Cloud Compute) Basics

AWS EC2 Instances are nothing but virtual machines on AWS. As part of this section, we will go through some of the basics related to AWS EC2 Basics.

  • Getting Started with AWS EC2

  • Create AWS EC2 Key Pair

  • Launch AWS EC2 Instance

  • Connecting to AWS EC2 Instance

  • AWS EC2 Security Groups Basics

  • AWS EC2 Public and Private IP Addresses

  • AWS EC2 Life Cycle

  • Allocating and Assigning AWS Elastic IP Address

  • Managing AWS EC2 Using AWS CLI

  • Upgrade or Downgrade AWS EC2 Instances

Infrastructure - AWS EC2 Advanced

In this section, we will continue with AWS EC2 to understand how we can manage EC2 instances using AWS Commands and also how to install additional OS modules leveraging bootstrap scripts.

  • Getting Started with AWS EC2

  • Understanding AWS EC2 Metadata

  • Querying on AWS EC2 Metadata

  • Fitering on AWS EC2 Metadata

  • Using Bootstrapping Scripts with AWS EC2 Instances to install additional softwares on AWS EC2 instances

  • Create an AWS AMI using AWS EC2 Instances

  • Validate AWS AMI - Lab

Data Ingestion using Lambda Functions

AWS Lambda functions are nothing but serverless functions. In this section, we will understand how we can develop and deploy Lambda functions using Python as a programming language. We will also see how to maintain a bookmark or checkpoint using s3.

  • Hello World using AWS Lambda

  • Setup Project for local development of AWS Lambda Functions

  • Deploy Project to AWS Lambda console

  • Develop download functionality using requests for AWSLambda Functions

  • Using 3rd party libraries in AWS Lambda Functions

  • Validating AWS s3 access for local development of AWS Lambda Functions

  • Develop upload functionality to s3 using AWS Lambda Functions

  • Validating AWS Lambda Functions using AWS Lambda Console

  • Run AWS Lambda Functions using AWS Lambda Console

  • Validating files incrementally downloaded using AWS Lambda Functions

  • Reading and Writing Bookmark to s3 using AWS Lambda Functions

  • Maintaining Bookmark on s3 using AWS Lambda Functions

  • Review the incremental upload logic developed using AWS Lambda Functions

  • Deploying AWS Lambda Functions

  • Schedule AWS Lambda Functions using AWS Event Bridge

Overview of AWS Glue Components

In this section, we will get a broad overview of all important Glue Components such as Glue Crawler, Glue Databases, Glue Tables, etc. We will also understand how to validate Glue tables using AWS Athena. AWS Glue (especially Glue Catalog) is one of the key components in the realm of AWS Data Analytics Services.

  • Introduction - Overview of AWS Glue Components

  • Create AWS Glue Crawler and AWS Glue Catalog Database as well as Table

  • Analyze Data using AWS Athena

  • Creating AWS S3 Bucket and Role to create AWS Glue Catalog Tables using Crawler on the s3 location

  • Create and Run the AWSGlue Job to process data in AWS Glue Catalog Tables

  • Validate using AWS Glue Catalog Table and by running queries using AWS Athena

  • Create and Run AWS Glue Trigger

  • Create AWS Glue Workflow

  • Run AWSGlue Workflow and Validate

Setup Spark History Server for AWS Glue Jobs

AWS Glue uses Apache Spark under the hood to process the data. It is important we setup Spark History Server for AWS Glue Jobs to troubleshoot any issues.

  • Introduction - Spark History Server for AWS Glue

  • Setup Spark History Server on AWS

  • Clone AWS Glue Samples repository

  • Build AWS Glue Spark UI Container

  • Update AWSIAM Policy Permissions

  • Start AWS Glue Spark UI Container

Deep Dive into AWSGlue Catalog

AWS Glue has several components, but the most important ones are nothing but AWS Glue Crawlers, Databases as well as Catalog Tables. In this section, we will go through some of the most important and commonly used features of the AWS Glue Catalog.

  • Prerequisites for AWS Glue Catalog Tables

  • Steps for Creating AWS Glue Catalog Tables

  • Download Data Set to use to create AWS Glue Catalog Tables

  • Upload data to s3 to crawl using AWSGlue Crawler to create required AWS Glue Catalog Tables

  • Create AWS Glue Catalog Database - itvghlandingdb

  • Create AWS Glue Catalog Table - ghactivity

  • Running Queries using AWS Athena - ghactivity

  • Crawling Multiple Folders using AWSGlue Crawlers

  • Managing AWS Glue Catalog using AWS CLI

  • Managing AWS Glue Catalog using Python Boto3

Exploring AWS Glue Job APIs

Once we deploy AWS Glue jobs, we can manage them using AWSGlue Job APIs. In this section we will get overview of AWS Glue Job APIs to run and manage the jobs.

  • Update AWS IAM Role for AWSGlue Job

  • Generate baseline AWS Glue Job

  • Running baseline AWS Glue Job

  • AWS Glue Script for Partitioning Data

  • Validating using AWS Athena

Understanding AWS Glue Job Bookmarks

AWSGlue Job Bookmarks can be leveraged to maintain the bookmarks or checkpoints for incremental loads. In this section, we will go through the details related to AWS Glue Job Bookmarks.

  • Introduction to AWS Glue Job Bookmarks

  • Cleaning up the data to run AWSGlue Jobs

  • Overview of AWS Glue CLI and Commands

  • Run AWSGlue Job using AWS Glue Bookmark

  • Validate AWS Glue Bookmark using AWS CLI

  • Add new data to the landing zone to run AWS Glue Jobs using Bookmarks

  • Rerun AWS Glue Job using Bookmark

  • Validate AWS Glue Job Bookmark and Files for Incremental run

  • Recrawl the AWSGlue Catalog Table using AWS CLI Commands

  • Run AWS Athena Queries for Data Validation

Development Lifecycle for Pyspark

In this section, we will focus on the development of Spark applications using Pyspark. We will use this application later while exploring EMR in detail.

  • Setup Virtual Environment and Install Pyspark

  • Getting Started with Pycharm

  • Passing Run Time Arguments

  • Accessing OS Environment Variables

  • Getting Started with Spark

  • Create Function for Spark Session

  • Setup Sample Data

  • Read data from files

  • Process data using Spark APIs

  • Write data to files

  • Validating Writing Data to Files

  • Productionizing the Code

Getting Started with AWSEMR (Elastic Map Reduce)

As part of this section, we will understand how to get started with AWSEMR Cluster. We will primarily focus on AWSEMR Web Console. Elastic Map Reduce is one of the key service in AWSData Analytics Services which provide capability to run applications which process large scale data leveraging distributed computing frameworks such as Spark.

  • Planning for AWS EMR Cluster

  • Create AWS EC2 Key Pair for AWSEMR Cluster

  • Setup AWS EMR Cluster with Apache Spark

  • Understanding Summary of AWS EMR Cluster

  • Review AWS EMR Cluster Application User Interfaces

  • Review AWS EMR Cluster Monitoring

  • Review AWS EMR Cluster Hardware and Cluster Scaling Policy

  • Review AWS EMR Cluster Configurations

  • Review AWS EMR Cluster Events

  • Review AWS EMR Cluster Steps

  • Review AWS EMR Cluster Bootstrap Actions

  • Connecting to AWS EMR Master Node using SSH

  • Disabling Termination Protection for AWS EMR Cluster and Terminating the AWS EMR Cluster

  • Clone and Create a New AWSEMR Cluster

  • Listing AWS S3 Buckets and Objects using AWS CLI on AWS EMR Cluster

  • Listing AWS S3 Buckets and Objects using HDFS CLI on AWS EMR Cluster

  • Managing Files in AWS S3 using HDFS CLI on AWS EMR Cluster

  • Review AWS Glue Catalog Databases and Tables

  • Accessing AWS Glue Catalog Databases and Tables using AWS EMR Cluster

  • Accessing spark-sql CLI of AWS EMR Cluster

  • Accessing pyspark CLI of AWS EMR Cluster

  • Accessing spark-shell CLI of AWS EMR Cluster

  • Create AWS EMR Cluster for Notebooks

Deploying Spark Applications using AWS EMR

As part of this section, we will understand how we typically deploy Spark Applications using AWS EMR. We will be using the Spark Application we deployed earlier.

  • Deploying Applications using AWS EMR - Introduction

  • Setup AWS EMR Cluster to deploy applications

  • Validate SSH Connectivity to Master node of AWS EMR Cluster

  • Setup Jupyter Notebook Environment on AWS EMR Cluster

  • Create required AWS s3 Bucket for AWS EMR Cluster

  • Upload GHActivity Data to s3 so that we can process using Spark Application deployed on AWSEMR Cluster

  • Validate Application using AWS EMR Compatible Versions of Python and Spark

  • Deploy Spark Application to AWS EMR Master Node

  • Create user space for ec2-user on AWS EMR Cluster

  • Run Spark Application using spark-submit on AWS EMR Master Node

  • Validate Data using Jupyter Notebooks on AWS EMR Cluster

  • Clone and Start Auto Terminated AWS EMR Cluster

  • Delete Data Populated by GHAcitivity Application using AWS EMR Cluster

  • Differences between Spark Client and Cluster Deployment Modes on AWS EMR Cluster

  • Running Spark Application using Cluster Mode on AWS EMR Cluster

  • Overview of Adding Pyspark Application as Step to AWS EMR Cluster

  • Deploy Spark Application to AWS S3 to run using AWS EMR Steps

  • Running Spark Applications as AWS EMR Steps in client mode

  • Running Spark Applications as AWS EMR Steps in cluster mode

  • Validate AWS EMR Step Execution of Spark Application

Streaming Data Ingestion Pipeline using AWS Kinesis

As part of this section, we will go through details related to the streaming data ingestion pipeline using AWS Kinesis which is a streaming service of AWSData Analytics Services. We will use AWS Kinesis Firehose Agent and AWS Kinesis Delivery Stream to read the data from log files and ingest it into AWS s3.

  • Building Streaming Pipeline using AWSKinesis Firehose Agent and Delivery Stream

  • Rotating Logs so that the files are created frequently which will be eventually ingested using AWSKinesis Firehose Agent and AWS Kinesis Firehose Delivery Stream

  • Set up AWS Kinesis Firehose Agent to get data from logs into AWSKinesis Delivery Stream.

  • Create AWS Kinesis Firehose Delivery Stream

  • Planning the Pipeline to ingest data into s3 using AWS Kinesis Delivery Stream

  • Create AWSIAM Group and User for Streaming Pipelines using AWSKinesis Components

  • Granting Permissions to AWS IAM User using Policy for Streaming Pipelines using AWSKinesis Components

  • Configure AWS Kinesis Firehose Agent to read the data from log files and ingest it into AWS Kinesis Firehose Delivery Stream.

  • Start and Validate AWSKinesis Firehose Agent

  • Conclusion - Building Simple Steaming Pipeline using AWS Kinesis Firehose

Consuming Data from AWS s3 using Python boto3 ingested using AWS Kinesis

As data is ingested into AWS S3, we will understand how data can ingested in AWS s3 can be processed using boto3.

  • Customizing AWS s3 folder using AWS Kinesis Delivery Stream

  • Create AWSIAMPolicy to read from AWS s3 Bucket

  • Validate AWS s3 access using AWS CLI

  • Setup Python Virtual Environment to explore boto3

  • Validating access to AWS s3 using Python boto3

  • Read Content from AWS s3 object

  • Read multiple AWS s3 Objects

  • Get the number of AWS s3 Objects using Marker

  • Get the size of AWS s3 Objects using Marker

Populating GitHub Data to AWS Dynamodb

As part of this section, we will understand how we can populate data to AWS Dynamodb tables using Python as a programming language.

  • Install required libraries to get GitHub Data to AWS Dynamodb tables.

  • Understanding GitHub APIs

  • Setting up GitHub API Token

  • Understanding GitHub Rate Limit

  • Create New Repository for since

  • Extracting Required Information using Python

  • Processing Data using Python

  • Grant Permissions to create AWS dynamodb tables using boto3

  • Create AWS Dynamodb Tables

  • AWS Dynamodb CRUD Operations

  • Populate AWS Dynamodb Table

  • AWS Dynamodb Batch Operations

Overview of Amazon AWS Athena

As part of this section, we will understand how to get started with AWS Athena using AWS Web console. We will also focus on basic DDLand DMLor CRUD Operations using AWS Athena Query Editor.

  • Getting Started with Amazon AWS Athena

  • Quick Recap of AWS Glue Catalog Databases and Tables

  • Access AWS Glue Catalog Databases and Tables using AWS Athena Query Editor

  • Create a Database and Table using AWS Athena

  • Populate Data into Table using AWS Athena

  • Using CTAS to create tables using AWS Athena

  • Overview of Amazon AWS Athena Architecture

  • Amazon AWS Athena Resources and relationship with Hive

  • Create a Partitioned Table using AWS Athena

  • Develop Query for Partitioned Column

  • Insert into Partitioned Tables using AWS Athena

  • Validate Data Partitioning using AWS Athena

  • Drop AWS Athena Tables and Delete Data Files

  • Drop Partitioned Table using AWS Athena

  • Data Partitioning in AWS Athena using CTAS

Amazon AWS Athena using AWS CLI

As part of this section, we will understand how to interact with AWS Athena using AWSCLICommands.

  • Amazon AWS Athena using AWS CLI - Introduction

  • Get help and list AWS Athena databases using AWS CLI

  • Managing AWS Athena Workgroups using AWS CLI

  • Run AWS Athena Queries using AWS CLI

  • Get AWS Athena Table Metadata using AWS CLI

  • Run AWS Athena Queries with a custom location using AWS CLI

  • Drop AWS Athena table using AWS CLI

  • Run CTAS under AWS Athena using AWS CLI

Amazon AWS Athena using Python boto3

As part of this section, we will understand how to interact with AWS Athena using Python boto3.

  • Amazon AWS Athena using Python boto3 - Introduction

  • Getting Started with Managing AWS Athena using Python boto3

  • List Amazon AWS Athena Databases using Python boto3

  • List Amazon AWS Athena Tables using Python boto3

  • Run Amazon AWS Athena Queries with boto3

  • Review AWS Athena Query Results using boto3

  • Persist Amazon AWS Athena Query Results in Custom Location using boto3

  • Processing AWS Athena Query Results using Pandas

  • Run CTAS against Amazon AWS Athena using Python boto3

Getting Started with Amazon AWSRedshift

As part of this section, we will understand how to get started with AWS Redshift using AWS Web console. We will also focus on basic DDLand DMLor CRUD Operations using AWS Redshift Query Editor.

  • Getting Started with Amazon AWS Redshift - Introduction

  • Create AWSRedshift Cluster using Free Trial

  • Connecting to Database using AWSRedshift Query Editor

  • Get a list of tables querying information schema

  • Run Queries against AWSRedshift Tables using Query Editor

  • Create AWSRedshift Table using Primary Key

  • Insert Data into AWSRedshift Tables

  • Update Data in AWSRedshift Tables

  • Delete data from AWSRedshift tables

  • Redshift Saved Queries using Query Editor

  • Deleting AWSRedshift Cluster

  • Restore AWSRedshift Cluster from Snapshot

Copy Data from s3 into AWSRedshift Tables

As part of this section, we will go through the details about copying data from s3 into AWS Redshift tables using the AWS Redshift Copy command.

  • Copy Data from s3 to AWSRedshift - Introduction

  • Setup Data in s3 for AWSRedshift Copy

  • Copy Database and Table for AWSRedshift Copy Command

  • Create IAM User with full access on s3 for AWSRedshift Copy

  • Run Copy Command to copy data from s3 to AWSRedshift Table

  • Troubleshoot Errors related to AWSRedshift Copy Command

  • Run Copy Command to copy from s3 to AWSRedshift table

  • Validate using queries against AWSRedshift Table

  • Overview of AWSRedshift Copy Command

  • Create IAM Role for AWSRedshift to access s3

  • Copy Data from s3 to AWSRedshift table using IAM Role

  • Setup JSON Dataset in s3 for AWSRedshift Copy Command

  • Copy JSON Data from s3 to AWSRedshift table using IAM Role

Develop Applications using AWSRedshift Cluster

As part of this section, we will understand how to develop applications against databases and tables created as part of AWSRedshift Cluster.

  • Develop application using AWSRedshift Cluster - Introduction

  • Allocate Elastic Ip for AWSRedshift Cluster

  • Enable Public Accessibility for AWSRedshift Cluster

  • Update Inbound Rules in Security Group to access AWSRedshift Cluster

  • Create Database and User in AWSRedshift Cluster

  • Connect to the database in AWSRedshift using psql

  • Change Owner on AWSRedshift Tables

  • Download AWSRedshift JDBC Jar file

  • Connect to AWSRedshift Databases using IDEs such as SQL Workbench

  • Setup Python Virtual Environment for AWSRedshift

  • Run Simple Query against AWSRedshift Database Table using Python

  • Truncate AWSRedshift Table using Python

  • Create IAM User to copy from s3 to AWSRedshift Tables

  • Validate Access of IAM User using Boto3

  • Run AWSRedshift Copy Command using Python

AWS Redshift Tables with Distkeys and Sortkeys

As part of this section, we will go through AWSRedshift-specific features such as distribution keys and sort keys to create AWS Redshift tables.

  • AWS Redshift Tables with Distkeys and Sortkeys - Introduction

  • Quick Review of AWSRedshift Architecture

  • Create multi-node AWSRedshift Cluster

  • Connect to AWSRedshift Cluster using Query Editor

  • Create AWSRedshift Database

  • Create AWSRedshift Database User

  • Create AWSRedshift Database Schema

  • Default Distribution Style of AWSRedshift Table

  • Grant Select Permissions on Catalog to AWSRedshift Database User

  • Update Search Path to query AWSRedshift system tables

  • Validate AWSRedshift table with DISTSTYLE AUTO

  • Create AWS Redshift Cluster from Snapshot to the original state

  • Overview of Node Slices in AWSRedshift Cluster

  • Overview of Distribution Styles related to AWS Redshift tables

  • Distribution Strategies for retail tables in AWS Redshift Databases

  • Create AWS Redshift tables with distribution style all

  • Troubleshoot and Fix Load or Copy Errors

  • Create AWSRedshift Table with Distribution Style Auto

  • Create AWSRedshift Tables using Distribution Style Key

  • Delete AWS Redshift Cluster with a manual snapshot

AWS Redshift Federated Queries and Spectrum

As part of this section, we will go through some of the advanced features of Redshift such as AWS Redshift Federated Queries and AWS Redshift Spectrum.

  • AWS Redshift Federated Queries and Spectrum - Introduction

  • Overview of integrating AWSRDS and AWSRedshift for Federated Queries

  • Create IAM Role for AWSRedshift Cluster

  • Setup Postgres Database Server for AWSRedshift Federated Queries

  • Create tables in Postgres Database for AWSRedshift Federated Queries

  • Creating Secret using Secrets Manager for Postgres Database

  • Accessing Secret Details using Python Boto3

  • Reading Json Data to Dataframe using Pandas

  • Write JSON Data to AWS Redshift Database Tables using Pandas

  • Create AWSIAM Policy for Secret and associate with Redshift Role

  • Create AWSRedshift Cluster using AWSIAM Role with permissions on secret

  • Create AWSRedshift External Schema to Postgres Database

  • Update AWSRedshift Cluster Network Settings for Federated Queries

  • Performing ETL using AWSRedshift Federated Queries

  • Clean up resources added for AWSRedshift Federated Queries

  • Grant Access on AWSGlue Data Catalog to AWSRedshift Cluster for Spectrum

  • Setup AWSRedshift Clusters to run queries using Spectrum

  • Quick Recap of AWSGlue Catalog Database and Tables for AWSRedshift Spectrum

  • Create External Schema using AWSRedshift Spectrum

  • Run Queries using AWSRedshift Spectrum

  • Cleanup the AWSRedshift Cluster

Taught by

Durga Viswanatha Raju Gadiraju, Ravindra Nandam and Perraju Vegiraju

Reviews

4.5 rating at Udemy based on 2273 ratings

Start your review of Data Engineering using AWS Data Analytics

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.