Overview

In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud. You will discover the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. You will come to understand the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. You will also be introduced to the architecture of an Azure Databricks Spark Cluster and Spark Jobs. You will work with large amounts of data from multiple sources in different raw formats. you will learn how Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries. This course is part of a Specialization intended for Data engineers and developers who want to demonstrate their expertise in designing and implementing data solutions that use Microsoft Azure data services for anyone interested in preparing for the Exam DP-203: Data Engineering on Microsoft Azure (beta). You will take a practice exam that covers key skills measured by the certification exam. This is the eighth course in a program of 10 courses to help prepare you to take the exam so that you can have expertise in designing and implementing data solutions that use Microsoft Azure data services. The Data Engineering on Microsoft Azure exam is an opportunity to prove knowledge expertise in integrating, transforming, and consolidating data from various structured and unstructured data systems into structures that are suitable for building analytics solutions that use Microsoft Azure data services. Each course teaches you the concepts and skills that are measured by the exam. By the end of this Specialization, you will be ready to take and sign-up for the Exam DP-203: Data Engineering on Microsoft Azure (beta).

Syllabus

Introduction to Azure Databricks

Describe the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. Describe the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. Describe the architecture of an Azure Databricks Spark Cluster and Spark Jobs.

Read and write data in Azure Databricks

Describe how to use Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries.

Data processing in Azure Databricks

Process data in Azure Databricks by defining DataFrames to read and process the Data. Perform data transformations in DataFrames and execute actions to display the transformed data. Explain the difference between a transform and an action, lazy and eager evaluations, Wide and Narrow transformations, and other optimizations in Azure Databricks.

Work with DataFrames in Azure Databricks

Use the DataFrame Column Class Azure Databricks to apply column-level transformations, such as sorts, filters and aggregations. Use advanced DataFrame functions operations to manipulate data, apply aggregates, and perform date and time operations in Azure Databricks.

Platform architecture, security, and data protection in Azure Databricks

Describe the Azure Databricks platform architecture and how it is securedUse Azure Key Vault to store secrets used by Azure Databricks and other services. Access Azure Storage with Key Vault-based secrets

Delta Lake

Describe how to use Delta Lake to create, append, and upsert data to Apache Spark tables, taking advantage of built-in reliability and optimizations. Describe Azure Databricks Delta Lake architecture

Analyze streaming data and create production workloads

Process streaming data with Azure Databricks structured streaming. Create production workloads on Azure Databricks with Azure Data Factory.

Create a data architecture

Describe how to put Azure Databricks notebooks under version control in an Azure DevOps repo and build deployment pipelines to manage your release process. Describe how to integrate Azure Databricks with Azure Synapse Analytics as part of your data architecture. Describe best practices for workspace administration, security, tools, integration, databricks runtime, HA/DR, and clusters in Azure Databricks