Learn about the big data ecosystem and how to use Spark to work with massive datasets. Learners will also store big data in a data lake and develop Lakehouse architecture on the Azure Databricks platform.
Overview
Syllabus
- Course Introduction
- In this lesson, you'll learn about the course, including the prerequisites, tools, environment, and course project.
- Big Data Ecosystem, Data Lakes, and Spark
- In this lesson, you will learn about the problems that Apache Spark is designed to solve. You'll also learn about the greater Big Data ecosystem and how Spark fits into it.
- Data Wrangling with Spark
- In this lesson, we'll dive into how to use Spark for cleaning and aggregating data.
- Spark Debugging and Optimization
- In this lesson, you will learn best practices for debugging and optimizing your Spark applications.
- Azure Databricks
- In this lesson, you'll create Spark Clusters and Spark code on the Azure Databricks platform.
- Data Lakes and Lakehouse with Azure Databricks
- In this lesson, you'll create data lakes and Lakehouse architecture on the Azure Databricks platform
- Building an Azure Data Lake for Bike Share Data Analytics
- In this project, you'll implement Lakehouse architecture on the Azure Databricks platform.
Taught by
Matt Swaffer