Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Microsoft

Implement a data lakehouse analytics solution with Azure Databricks

Microsoft via Microsoft Learn

Overview

  • Module 1: Describe Azure Databricks
  • In this module, you will:

    • Understand the Azure Databricks platform
    • Create your own Azure Databricks workspace
    • Create a notebook inside your home folder in Databricks
    • Understand the fundamentals of Apache Spark notebook
    • Create, or attach to, a Spark cluster
    • Identify the types of tasks well-suited to the unified analytics engine Apache Spark
  • Module 2: Spark architecture fundamentals
  • In this module, you will:

    • Understand the architecture of an Azure Databricks Spark Cluster
    • Understand the architecture of a Spark Job
  • Module 3: Read and write data in Azure Databricks
  • In this module, you will:

    • Use Azure Databricks to read multiple file types, both with and without a Schema.
    • Combine inputs from files and data stores, such as Azure SQL Database.
    • Transform and store that data for advanced analytics.
  • Module 4: Work with DataFrames in Azure Databricks
  • In this module, you will:

    • Use the count() method to count rows in a DataFrame
    • Use the display() function to display a DataFrame in the Notebook
    • Cache a DataFrame for quicker operations if the data is needed a second time
    • Use the limit function to display a small set of rows from a larger DataFrame
    • Use select() to select a subset of columns from a DataFrame
    • Use distinct() and dropDuplicates to remove duplicate data
    • Use drop() to remove columns from a DataFrame
  • Module 5: Describe lazy evaluation and other performance features in Azure Databricks
  • In this module, you will:

    • Describe the difference between eager and lazy execution
    • Define and identify transformations
    • Define and identify actions
    • Describe the fundamentals of how the Catalyst Optimizer works
    • Differentiate between wide and narrow transformations
  • Module 6: Work with DataFrames columns in Azure Databricks
  • In this module, you will:

    • Learn the syntax for specifying column values for filtering and aggregations
    • Understand the use of the Column Class
    • Sort and filter a DataFrame based on Column Values
    • Use collect() and take() to return records from a Dataframe to the driver of the cluster
  • Module 7: Work with DataFrames advanced methods in Azure Databricks
  • In this module, you will:

    • Manipulate date and time values in Azure Databricks
    • Rename columns in Azure Databricks
    • Aggregate data in Azure Databricks DataFrames
  • Module 8: Describe platform architecture, security, and data protection in Azure Databricks
  • In this module, you will:

    • Learn the Azure Databricks platform architecture and how it is secured.
    • Use Azure Key Vault to store secrets used by Azure Databricks and other services.
    • Access Azure Storage with Key Vault-based secrets.
  • Module 9: Build and query a Delta Lake
  • In this module, you will:

    • Learn about the key features and use cases of Delta Lake.
    • Use Delta Lake to create, append, and upsert tables.
    • Perform optimizations in Delta Lake.
    • Compare different versions of a Delta table using Time Machine.
  • Module 10: Process streaming data with Azure Databricks structured streaming
  • In this module, you will:

    • Learn the key features and uses of Structured Streaming.
    • Stream data from a file and write it out to a distributed file system.
    • Use sliding windows to aggregate over chunks of data rather than all data.
    • Apply watermarking to throw away stale old data that you do not have space to keep.
    • Connect to Event Hubs read and write streams.
  • Module 11: Describe Azure Databricks Delta Lake architecture
  • In this module, you will:

    • Process batch and streaming data with Delta Lake.
    • Learn how Delta Lake architecture enables unified streaming and batch analytics with transactional guarantees within a data lake.
  • Module 12: Create production workloads on Azure Databricks with Azure Data Factory
  • In this module, you'll:

    • Create an Azure Data Factory pipeline with a Databricks activity.
    • Execute a Databricks notebook with a parameter.
    • Retrieve and log a parameter passed back from the notebook.
    • Monitor your Data Factory pipeline.
  • Module 13: Implement CI/CD with Azure DevOps
  • In this module, you will:

    • Learn about CI/CD and how it applies to data engineering.
    • Use Azure DevOps as a source code repository for Azure Databricks notebooks.
    • Create build and release pipelines in Azure DevOps to automatically deploy a notebook from a development to a production Azure Databricks workspace.
  • Module 14: Integrate Azure Databricks with Azure Synapse
  • In this module, you will:

    • Access Azure Synapse Analytics from Azure Databricks by using the - SQL Data Warehouse connector.
  • Module 15: Describe Azure Databricks best practices
  • In this module, you will learn best practices in the following categories:

    • Workspace administration
    • Security
    • Tools & integration
    • Databricks runtime
    • HA/DR
    • Clusters

Syllabus

  • Module 1: Describe Azure Databricks
    • Introduction
    • Explain Azure Databricks
    • Create an Azure Databricks workspace and cluster
    • Understand Azure Databricks Notebooks
    • Exercise: Work with Notebooks
    • Knowledge check
    • Summary
  • Module 2: Spark architecture fundamentals
    • Introduction
    • Understand the architecture of Azure Databricks spark cluster
    • Understand the architecture of spark job
    • Knowledge check
    • Summary
  • Module 3: Read and write data in Azure Databricks
    • Introduction
    • Read data in CSV format
    • Read data in JSON format
    • Read data in Parquet format
    • Read data stored in tables and views
    • Write data
    • Exercises: Read and write data
    • Knowledge check
    • Summary
  • Module 4: Work with DataFrames in Azure Databricks
    • Introduction
    • Describe a DataFrame
    • Use common DataFrame methods
    • Use the display function
    • Exercise: Distinct articles
    • Knowledge check
    • Summary
  • Module 5: Describe lazy evaluation and other performance features in Azure Databricks
    • Introduction
    • Describe the difference between eager and lazy execution
    • Describe the fundamentals of how the Catalyst Optimizer works
    • Define and identify actions and transformations
    • Describe performance enhancements enabled by shuffle operations and Tungsten
    • Knowledge check
    • Summary
  • Module 6: Work with DataFrames columns in Azure Databricks
    • Introduction
    • Describe the column class
    • Work with column expressions
    • Exercise: Washingtons and Marthas
    • Knowledge check
    • Summary
  • Module 7: Work with DataFrames advanced methods in Azure Databricks
    • Introduction
    • Perform date and time manipulation
    • Use aggregate functions
    • Exercise: Deduplication of data
    • Knowledge check
    • Summary
  • Module 8: Describe platform architecture, security, and data protection in Azure Databricks
    • Introduction
    • Describe the Azure Databricks platform architecture
    • Perform data protection
    • Describe Azure key vault and Databricks security scopes
    • Secure access with Azure IAM and authentication
    • Describe security
    • Exercise: Access Azure Storage with key vault-backed secrets
    • Knowledge check
    • Summary
  • Module 9: Build and query a Delta Lake
    • Introduction
    • Describe the open source Delta Lake
    • Exercise: Work with basic Delta Lake functionality
    • Describe how Azure Databricks manages Delta Lake
    • Exercise: Use the Delta Lake Time Machine and perform optimization
    • Knowledge check
    • Summary
  • Module 10: Process streaming data with Azure Databricks structured streaming
    • Introduction
    • Describe Azure Databricks structured streaming
    • Perform stream processing using structured streaming
    • Work with Time Windows
    • Process data from Event Hubs with structured streaming
    • Knowledge check
    • Summary
  • Module 11: Describe Azure Databricks Delta Lake architecture
    • Introduction
    • Describe bronze, silver, and gold architecture
    • Perform batch and stream processing
    • Knowledge check
    • Summary
  • Module 12: Create production workloads on Azure Databricks with Azure Data Factory
    • Introduction
    • Schedule Databricks jobs in a data factory pipeline
    • Pass parameters into and out of Databricks jobs in data factory
    • Knowledge check
    • Summary
  • Module 13: Implement CI/CD with Azure DevOps
    • Introduction
    • Describe CI/CD
    • Create a CI/CD process with Azure DevOps
    • Knowledge check
    • Summary
  • Module 14: Integrate Azure Databricks with Azure Synapse
    • Introduction
    • Integrate with Azure Synapse Analytics
    • Knowledge check
    • Summary
  • Module 15: Describe Azure Databricks best practices
    • Introduction
    • Understand workspace administration best practices
    • List security best practices
    • Describe tools and integration best practices
    • Explain Databricks runtime best practices
    • Understand cluster best practices
    • Knowledge check
    • Summary

Reviews

Start your review of Implement a data lakehouse analytics solution with Azure Databricks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.