What you'll learn:
- Getting Started with Amazon Redshift using AWS Web Console
- Copy Data from s3 into AWS Redshift Tables using Redshift Queries or Commands
- Develop Applications using Redshift Cluster using Python as Programming Language
- Copy Data from s3 into AWS Redshift Tables using Python as Programming Language
- Create Tables using Databases setup on AWS Redshift Database Server using Distribution Keys and Sort Keys
- Run AWS Redshift Federated Queries connecting to traditional RDBMS Databases such as Postgres
- Perform ETL using AWS Redshift Federated Queries using Redshift Capacity
- Integration of AWS Redshift and AWS Glue Catalog to run queries using Redshift Spectrum
- Run AWS Redshift Spectrum Queries using Glue Catalog Tables on Datalake setup using AWS s3
- Getting Started with Amazon Redshift Serverless by creating Workgroup and Namespace
- Integration of AWS EMR Cluster with Amazon Redshift using Serverless Workgroup
- Develop and Deploy Spark Application on AWS EMR Cluster where the processed data will be loaded into Amazon Redshift Serverless Workgroup
AWS or Amazon Redshiftis one of the key AWS Services used in building Data Warehouses or Data Marts to serve reports and dashboards for business users. As part of this course, you will end up learning AWSor Amazon Redshiftby going through all the important features of AWSor Amazon Redshift to build Data Warehouses or Data Marts.
We have covered features such as Federated Queries, Redshift Spectrum, Integration with Python, AWSLambda Functions, Integration of Redshift with EMR, and End-to-End Pipeline using AWS Step Functions.
Here is the detailed outline of the course.
First, we will understand how to Get Started with Amazon Redshift using AWSWeb Console. We will see how to create a cluster, how to connect to the cluster, and also how to run the queries using a Web-based query editor. We will also go ahead and create a Database and tables in the Redshift Cluster. Once we set up a Database and tables, we will also go through the details related to CRUD Operations against tables in Databases in Redshift Cluster.
Once we have the databases and tables in Redshift Cluster, it is time for us to understand how to get data into the tables in Redshift Cluster. One of the common approaches we use to get data into the Redshift cluster is by Copying Data from s3 into Redshift Tables. We will go through the step-by-step process of copying the data into Redshift tables from s3 using the copy command.
Python is one of the prominent programming languages to build Data Engineering or ETLApplications. It is extensively used to build ETLJobs to get data into Database Tables in Redshift Cluster. Once we understand how to get data from s3 to Redshift tables using Copy Command, we will learn how to Develop Python-based Data Engineering or ETL Applications using Redshift Cluster. We will learn how to perform CRUDoperations and also how to take run COPY Commands using Python-based programs.
Once we understand how to build applications using Redshift Cluster, we will go through some of the key concepts used while creating Redshift Tables with Distkeys and Sortkeys.
We can also connect to remote databases such as Postgres and run queries directly on the remote database tables using Redshift Federated Queries and also we can run queries on top of Glue or Athena Catalog using Redshift Spectrum. You will learn how to leverage Redshift Federated Queries and Spectrum to process data in remote Database tables or s3 without copying the data.
You will also get an overview of Amazon Redshift Serverless as part of Getting Started with Amazon Redshift Serverless.
Once you learn Amazon Redshift Serverless, you will end up deploying a Pipeline where a Spark Application is deployed on AWS EMRCluster which will load the data processed by Spark into Redshift.