Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Microsoft

Perform data science with Azure Databricks

Microsoft via Microsoft Learn

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
  • Module 1: Describe Azure Databricks
  • In this module, you will:

    • Understand the Azure Databricks platform
    • Create your own Azure Databricks workspace
    • Create a notebook inside your home folder in Databricks
    • Understand the fundamentals of Apache Spark notebook
    • Create, or attach to, a Spark cluster
    • Identify the types of tasks well-suited to the unified analytics engine Apache Spark
  • Module 2: Spark architecture fundamentals
  • In this module, you will:

    • Understand the architecture of an Azure Databricks Spark Cluster
    • Understand the architecture of a Spark Job
  • Module 3: Read and write data in Azure Databricks
  • In this module, you will:

    • Use Azure Databricks to read multiple file types, both with and without a Schema.
    • Combine inputs from files and data stores, such as Azure SQL Database.
    • Transform and store that data for advanced analytics.
  • Module 4: Work with DataFrames in Azure Databricks
  • In this module, you will:

    • Use the count() method to count rows in a DataFrame
    • Use the display() function to display a DataFrame in the Notebook
    • Cache a DataFrame for quicker operations if the data is needed a second time
    • Use the limit function to display a small set of rows from a larger DataFrame
    • Use select() to select a subset of columns from a DataFrame
    • Use distinct() and dropDuplicates to remove duplicate data
    • Use drop() to remove columns from a DataFrame
  • Module 5: Work with user-defined functions
  • In this module, you will learn how to:

    • Write User-Defined Functions
    • Perform ETL operations using User-Defined Functions
  • Module 6: Build and query a Delta Lake
  • In this module, you will:

    • Learn about the key features and use cases of Delta Lake.
    • Use Delta Lake to create, append, and upsert tables.
    • Perform optimizations in Delta Lake.
    • Compare different versions of a Delta table using Time Machine.
  • Module 7: Perform machine learning with Azure Databricks
  • In this module, you will learn how to:

    • Perform Machine Learning
    • Train a model and create predictions
    • Perform exploratory data analysis
    • Describe machine learning workflows
    • Build and evaluate machine learning models
  • Module 8: Train a machine learning model
  • In this module, you will learn how to:

    • Perform featurization of the dataset
    • Finish featurization of the dataset
    • Understand Regression modeling
    • Build and interpret a regression model
  • Module 9: Work with MLflow in Azure Databricks
  • In this module, you will learn how to:

    • Use MLflow to track experiments, log metrics, and compare runs
    • Work with MLflow to track experiment metrics, parameters, artifacts and models.
  • Module 10: Perform model selection with hyperparameter tuning
  • In this module, you will learn how to:

    • Describe Model selection and Hyperparameter Tuning
    • Select the optimal model by tuning Hyperparameters
  • Module 11: Deep learning with Horovod for distributed training
  • In this module, you will learn how to:

    • Use Horovod to train a deep learning model
    • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
    • Work with Horovod and Petastorm for training a deep learning model
  • Module 12: Work with Azure Machine Learning to deploy serving models
  • In this module, you will learn how to:

    • Use Azure Machine Learning to deploy Serving Models

Syllabus

  • Module 1: Describe Azure Databricks
    • Introduction
    • Explain Azure Databricks
    • Create an Azure Databricks workspace and cluster
    • Understand Azure Databricks Notebooks
    • Exercise: Work with Notebooks
    • Knowledge check
    • Summary
  • Module 2: Spark architecture fundamentals
    • Introduction
    • Understand the architecture of Azure Databricks spark cluster
    • Understand the architecture of spark job
    • Knowledge check
    • Summary
  • Module 3: Read and write data in Azure Databricks
    • Introduction
    • Read data in CSV format
    • Read data in JSON format
    • Read data in Parquet format
    • Read data stored in tables and views
    • Write data
    • Exercises: Read and write data
    • Knowledge check
    • Summary
  • Module 4: Work with DataFrames in Azure Databricks
    • Introduction
    • Describe a DataFrame
    • Use common DataFrame methods
    • Use the display function
    • Exercise: Distinct articles
    • Knowledge check
    • Summary
  • Module 5: Work with user-defined functions
    • Introduction
    • Write user defined functions
    • Exercise: Perform Extract, Transform, Load(ETL) operations using user-defined functions
    • Knowledge check
    • Summary
  • Module 6: Build and query a Delta Lake
    • Introduction
    • Describe the open source Delta Lake
    • Exercise: Work with basic Delta Lake functionality
    • Describe how Azure Databricks manages Delta Lake
    • Exercise: Use the Delta Lake Time Machine and perform optimization
    • Knowledge check
    • Summary
  • Module 7: Perform machine learning with Azure Databricks
    • Introduction
    • Understand machine learning
    • Exercise: Train a model and create predictions
    • Understand data using exploratory data analysis
    • Exercise: Perform exploratory data analysis
    • Describe machine learning workflows
    • Exercise: Build and evaluate a baseline machine learning model
    • Knowledge check
    • Summary
  • Module 8: Train a machine learning model
    • Introduction
    • Perform featurization of the dataset
    • Exercise: Finish featurization of the dataset
    • Understand regression modeling
    • Exercise: Build and interpret a regression model
    • Knowledge check
    • Summary
  • Module 9: Work with MLflow in Azure Databricks
    • Introduction
    • Use MLflow to track experiments, log metrics, and compare runs
    • Exercise: Work with MLflow to track experiment metrics, parameters, artifacts and models
    • Knowledge check
    • Summary
  • Module 10: Perform model selection with hyperparameter tuning
    • Introduction
    • Describe model selection and hyperparameter tuning
    • Exercise: Select optimal model by tuning hyperparameters
    • Knowledge check
    • Summary
  • Module 11: Deep learning with Horovod for distributed training
    • Introduction
    • Use Horovod to train a deep learning model
    • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
    • Exercise: Work with Horovod and Petastorm for training a deep learning model
    • Knowledge check
    • Summary
  • Module 12: Work with Azure Machine Learning to deploy serving models
    • Introduction
    • Use Azure Machine Learning to deploy serving models
    • Knowledge check
    • Summary

Reviews

Start your review of Perform data science with Azure Databricks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.