What you'll learn:
- Setting up a Data Warehouse on Amazon Cloud using Redshift from scratch
- Learn and understand AWS Athena and when to make use of Athena
- Learn how to store data in S3 Data lakes using Parquet columnar file formats and optimize the process of data scans using Athena
- Learn and automate the ETL processes using different server-less components like AWS Glue , Data Pipeline and Lambda Functions
- Data Centralization using Redshift Spectrum
- Trigger and Automate Glue jobs using Lambda Functions
- Understand how to pull data into QuickSight which is a BI-Reporting/Visualization offering from AWS
AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .
Data Scientists/Analysts/Business Analysts will soon be expected to (if not already) become all-rounders and handle the technical aspect of data ingestion/engineering/warehousing .
Anyone who has the basic understanding of how cloud works can benefit from this course because :
- This course is designed keeping in mind end to end life cycle of a typical data engineering project
- Provides a practical solution to real-world use-cases
This Course covers :
Setting up a data warehouse in AWSRedshift from scratch
Basic Data Warehousing Concepts
Writing server-less AWSGlue Jobs (pyspark and python shell) for ETLand batch processing
AWSAthena for ad-hoc analysis (when to use Athena)
AWSData Pipeline to sync incremental data
Lambda functions to trigger and automate ETL/Data Syncing processes
QuickSight Setup , Analyses and Dashboards
Prerequisites for this course are :
Python / Sql (Absolute must)
PySpark (should know how to write some basic Pyspark scripts)
Willingness to explore ,learn and put in the extra effort to succeed
An active AWSAccount
Important Note - This course makes use of the free tiers for Redshift and RDS , so you will not be billed for them unless you exceed the free tier usage which should be more than enough to get enough practice from this course .
Also , this course makes use of AWS UIon the browser for creating clusters and setting up jobs , there is no bash scripting involved. One can use any operating system to perform the lab sessions in this course .
This course is not code-intense or code-heavy ,there is only 35% coding involved , the rest is execution,understanding and chaining different component together. The whole purpose of this course is to make everyone aware of and feel comfortable with all the tools/features used in this course .
Some Tips :
Try to watch the videos at 1.2X speed
Every time you work on a new component or feature , do some research on the other tools that are meant for the same purpose and see how they differ and in what aspects , For Eg Redshift/Athena vs Snowflake or Bigquery , QuickSight vs PowerBi vs Microstrategy