Implement a data lakehouse analytics solution with Azure Databricks

Microsoft via Microsoft Learn

Go to class Write review

Overview

Module 1: Describe Azure Databricks

In this module, you will:

Understand the Azure Databricks platform
Create your own Azure Databricks workspace
Create a notebook inside your home folder in Databricks
Understand the fundamentals of Apache Spark notebook
Create, or attach to, a Spark cluster
Identify the types of tasks well-suited to the unified analytics engine Apache Spark

Module 2: Spark architecture fundamentals

In this module, you will:

Understand the architecture of an Azure Databricks Spark Cluster
Understand the architecture of a Spark Job

Module 3: Read and write data in Azure Databricks

In this module, you will:

Use Azure Databricks to read multiple file types, both with and without a Schema.
Combine inputs from files and data stores, such as Azure SQL Database.
Transform and store that data for advanced analytics.

Module 4: Work with DataFrames in Azure Databricks

In this module, you will:

Use the count() method to count rows in a DataFrame
Use the display() function to display a DataFrame in the Notebook
Cache a DataFrame for quicker operations if the data is needed a second time
Use the limit function to display a small set of rows from a larger DataFrame
Use select() to select a subset of columns from a DataFrame
Use distinct() and dropDuplicates to remove duplicate data
Use drop() to remove columns from a DataFrame

Module 5: Describe lazy evaluation and other performance features in Azure Databricks

In this module, you will:

Describe the difference between eager and lazy execution
Define and identify transformations
Define and identify actions
Describe the fundamentals of how the Catalyst Optimizer works
Differentiate between wide and narrow transformations

Module 6: Work with DataFrames columns in Azure Databricks

In this module, you will:

Learn the syntax for specifying column values for filtering and aggregations
Understand the use of the Column Class
Sort and filter a DataFrame based on Column Values
Use collect() and take() to return records from a Dataframe to the driver of the cluster

Module 7: Work with DataFrames advanced methods in Azure Databricks

In this module, you will:

Manipulate date and time values in Azure Databricks
Rename columns in Azure Databricks
Aggregate data in Azure Databricks DataFrames

Module 8: Describe platform architecture, security, and data protection in Azure Databricks

In this module, you will:

Learn the Azure Databricks platform architecture and how it is secured.
Use Azure Key Vault to store secrets used by Azure Databricks and other services.
Access Azure Storage with Key Vault-based secrets.

Module 9: Build and query a Delta Lake

In this module, you will:

Learn about the key features and use cases of Delta Lake.
Use Delta Lake to create, append, and upsert tables.
Perform optimizations in Delta Lake.
Compare different versions of a Delta table using Time Machine.

Module 10: Process streaming data with Azure Databricks structured streaming

In this module, you will:

Learn the key features and uses of Structured Streaming.
Stream data from a file and write it out to a distributed file system.
Use sliding windows to aggregate over chunks of data rather than all data.
Apply watermarking to throw away stale old data that you do not have space to keep.
Connect to Event Hubs read and write streams.

Module 11: Describe Azure Databricks Delta Lake architecture

In this module, you will:

Process batch and streaming data with Delta Lake.
Learn how Delta Lake architecture enables unified streaming and batch analytics with transactional guarantees within a data lake.

Module 12: Create production workloads on Azure Databricks with Azure Data Factory

In this module, you'll:

Create an Azure Data Factory pipeline with a Databricks activity.
Execute a Databricks notebook with a parameter.
Retrieve and log a parameter passed back from the notebook.
Monitor your Data Factory pipeline.

Module 13: Implement CI/CD with Azure DevOps

In this module, you will:

Learn about CI/CD and how it applies to data engineering.
Use Azure DevOps as a source code repository for Azure Databricks notebooks.
Create build and release pipelines in Azure DevOps to automatically deploy a notebook from a development to a production Azure Databricks workspace.

Module 14: Integrate Azure Databricks with Azure Synapse

In this module, you will:

Access Azure Synapse Analytics from Azure Databricks by using the - SQL Data Warehouse connector.

Module 15: Describe Azure Databricks best practices

In this module, you will learn best practices in the following categories:

Workspace administration
Security
Tools & integration
Databricks runtime
HA/DR
Clusters

Syllabus

Module 1: Describe Azure Databricks

Introduction
Explain Azure Databricks
Create an Azure Databricks workspace and cluster
Understand Azure Databricks Notebooks
Exercise: Work with Notebooks
Knowledge check
Summary

Module 2: Spark architecture fundamentals

Introduction
Understand the architecture of Azure Databricks spark cluster
Understand the architecture of spark job
Knowledge check
Summary

Module 3: Read and write data in Azure Databricks

Introduction
Read data in CSV format
Read data in JSON format
Read data in Parquet format
Read data stored in tables and views
Write data
Exercises: Read and write data
Knowledge check
Summary

Module 4: Work with DataFrames in Azure Databricks

Introduction
Describe a DataFrame
Use common DataFrame methods
Use the display function
Exercise: Distinct articles
Knowledge check
Summary

Module 5: Describe lazy evaluation and other performance features in Azure Databricks

Introduction
Describe the difference between eager and lazy execution
Describe the fundamentals of how the Catalyst Optimizer works
Define and identify actions and transformations
Describe performance enhancements enabled by shuffle operations and Tungsten
Knowledge check
Summary

Module 6: Work with DataFrames columns in Azure Databricks

Introduction
Describe the column class
Work with column expressions
Exercise: Washingtons and Marthas
Knowledge check
Summary

Module 7: Work with DataFrames advanced methods in Azure Databricks

Introduction
Perform date and time manipulation
Use aggregate functions
Exercise: Deduplication of data
Knowledge check
Summary

Module 8: Describe platform architecture, security, and data protection in Azure Databricks

Introduction
Describe the Azure Databricks platform architecture
Perform data protection
Describe Azure key vault and Databricks security scopes
Secure access with Azure IAM and authentication
Describe security
Exercise: Access Azure Storage with key vault-backed secrets
Knowledge check
Summary

Module 9: Build and query a Delta Lake

Introduction
Describe the open source Delta Lake
Exercise: Work with basic Delta Lake functionality
Describe how Azure Databricks manages Delta Lake
Exercise: Use the Delta Lake Time Machine and perform optimization
Knowledge check
Summary

Module 10: Process streaming data with Azure Databricks structured streaming

Introduction
Describe Azure Databricks structured streaming
Perform stream processing using structured streaming
Work with Time Windows
Process data from Event Hubs with structured streaming
Knowledge check
Summary

Module 11: Describe Azure Databricks Delta Lake architecture

Introduction
Describe bronze, silver, and gold architecture
Perform batch and stream processing
Knowledge check
Summary

Module 12: Create production workloads on Azure Databricks with Azure Data Factory

Introduction
Schedule Databricks jobs in a data factory pipeline
Pass parameters into and out of Databricks jobs in data factory
Knowledge check
Summary

Module 13: Implement CI/CD with Azure DevOps

Introduction
Describe CI/CD
Create a CI/CD process with Azure DevOps
Knowledge check
Summary

Module 14: Integrate Azure Databricks with Azure Synapse

Introduction
Integrate with Azure Synapse Analytics
Knowledge check
Summary

Module 15: Describe Azure Databricks best practices

Introduction
Understand workspace administration best practices
List security best practices
Describe tools and integration best practices
Explain Databricks runtime best practices
Understand cluster best practices
Knowledge check
Summary

Reviews

Start your review of Implement a data lakehouse analytics solution with Azure Databricks

Go to class

From Zero to Cybersecurity Analyst

Most common

Popular subjects

Popular courses

Implement a data lakehouse analytics solution with Azure Databricks

Overview

Syllabus

Tags

Reviews

From Zero to Cybersecurity Analyst

Tags

Perform data science with Azure Databricks

Implement a Data Analytics Solution with Azure Synapse Analytics

Implement a Lakehouse with Microsoft Fabric

Microsoft Azure Data Fundamentals: Explore data analytics in Azure

Implement a machine learning solution with Azure Databricks

Perform data engineering with Azure Synapse Apache Spark Pools

300+ Courses to Prepare your Microsoft Azure Certification

Never Stop Learning.