The two key components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment. This is the first course of the Data Engineering on Google Cloud series. After completing this course, enroll in the Building Batch Data Pipelines on Google Cloud course.
Overview
Syllabus
- Introduction
- Course series introduction
- Course introduction
- Introduction to Data Engineering
- Module introduction
- The role of a data engineer
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Partner effectively with other data teams
- Manage data access and governance
- Demo: Finding PII in your dataset with the DLP API
- Build production-ready pipelines
- Google Cloud customer case study
- Recap
- Lab Intro: Using BigQuery to do Analysis
- Using BigQuery to do Analysis
- Quiz: Introduction to Data Engineering
- Building a Data Lake
- Module Introduction
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Build a data lake using Cloud Storage
- Secure Cloud Storage
- Store all sorts of data types
- Cloud SQL as a relational data lake
- Lab Intro: Loading Taxi Data into Google Cloud SQL
- Loading Taxi Data into Google Cloud SQL 2.5
- Quiz: Building a Data Lake
- Building a Data Warehouse
- Module Introduction
- The modern data warehouse
- Introduction to BigQuery
- Demo: Querying TB of data in seconds
- Get started with BigQuery
- Load data into BigQuery
- Lab Intro: Loading Data into BigQuery
- Loading data into BigQuery
- Explore schemas
- Demo: Exploring Schemas
- Schema design
- Nested and repeated fields
- Demo: Nested and repeated fields
- Design the optimal schema for BigQuery
- Lab Intro: Working with JSON and Array data in BigQuery
- Working with JSON and Array data in BigQuery 2.5
- Optimize with partitioning and clustering
- Lab Intro: Partitioned Tables in BigQuery
- Partitioned Tables in Google BigQuery
- Review
- Building a Data Warehouse
- Summary
- Course Summary
- Course Resources
- Modernizing Data Lakes and Data Warehouses with Google Cloud
- Your Next Steps
- Course Badge