Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.
Overview
Syllabus
Week 1: Hello World, I mean, Hello Data Lakes!
- Video: Meet the Instructors
- Video: Introduction to Week 1
- Video: Why Data Lakes?
- Video: Characteristics of a Data Lake
- Video: Data Lake Components
- Reading: Data Lake Characteristics and Components
- Video: Comparison of a Data Lake to a Data Warehouse
- Reading: Data Lakes and Data Warehouses
- Video: Discussing sample Data Lake Architectures
- Quiz/Assessment: Week 1 quiz
Week 2: AWS data related services
- Video: Introduction to Week 2
- Video: AWS Data Lake related services
- Video: Amazon S3
- Video: AWS Glue Data Catalog
- Reading: S3 and Glue Data Catalog
- Video: AWS Services used for data movement
- Reading: Kinesis, API Gateway, etc
- Video: AWS Services for Data processing
- Video: AWS Services for Analytics
- Video: AWS Services used for Predictive Analytics and Machine Learning
- Reading: EMR, Glue Jobs, Lambda, Kinesis Analytics, Redshift
- Video: Introduction to AWS LakeFormation
- Reading: LakeFormation
- Lab: Get familiar with AWS Services and create your first simple data lake
Week 3: Ingesting the rivers
- Video: Introduction to Week 3
- Video: Use the right tool for the job
- Video: Understanding Data Structure and when to process data
- Video: Data Streaming ingestion with Amazon Kinesis Services
- Video: Diving Deep on Amazon Kinesis
- Demo: Batch Data Ingestion with AWS Transfer Family
- Reading: Batch Data Ingestion with AWS Services
- Video: Data Cataloging
- Demo: Using Glue Crawlers
- Reading: The importance of data cataloging
- Video: Reviewing the ingestion part of some Data Lake architectures
- Lab: Ingesting Web Logs
Week 4: Processing and Analyzing data that sits in the Data Lake
- Video: Introduction to Week 4
- Video: Data prep and AWS Glue jobs
- Video: File optimizations
- Demo: Using S3, Glue and Athena to get insights about NYC Taxi data
- Reading: Glue Jobs, Data Prep, Athena? Columnar Data Formats and Amazon Athena Optimizations
- Video: Introduction to Data Lake security
- Reading: Security and compliance
- Video: The power of data visualization
- Video: Introduction to Amazon QuickSight
- Demo: Amazon Quicksight
- Reading: Data visualization, Amazon QuickSight
- Video: Registry of Open Data on AWS
- Lab: Create an end-to-end Data Lake with AWS Services
- Video: Course wrap-up!
Taught by
Rafael Lopes and Morgan Willis