Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

edX

Data Engineering with Databricks

Pragmatic AI Labs via edX

Overview

Master Data Engineering on Databricks Lakehouse Platform

  • Learn Databricks architecture, cluster management & notebook analysis
  • Build reliable ETL pipelines with Delta Lake for data transformation
  • Implement advanced data processing techniques with Apache Spark

Course Highlights:

  • Create & scale Databricks clusters for workloads
  • Load data from diverse sources into notebooks
  • Explore, visualize & profile datasets with notebooks
  • Version control & share notebooks via Git integration
  • Read & ingest data in various file formats
  • Transform data with SQL & DataFrame operations
  • Handle complex data types like arrays, structs, timestamps
  • Deduplicate, join & flatten nested data structures
  • Identify & fix data quality issues with UDFs
  • Load cleansed data into Delta Lake for reliability
  • Build production-ready pipelines with Delta Live Tables
  • Schedule & monitor workloads using Databricks Jobs
  • Secure data access with Unity Catalog

Gain comprehensive skills in data engineering on Databricks through hands-on labs, real-world projects and best practices for the modern data lakehouse.

Syllabus

Module 1: Databricks Lakehouse Platform Fundamentals

  • Introduction to the Databricks Lakehouse Platform and its architecture

  • Creating, managing, and configuring clusters

  • Setting up and using Databricks with IntelliJ, RStudio, and the Databricks CLI

  • Introduction to notebooks, including execution, sharing, and multi-language support

  • Efficient data transformation with Spark SQL and the Catalog Explorer

  • Creating tables from files and querying external data sources

  • Reliable data pipelines with Delta Lake, ACID transactions, and Z-Ordering optimization

Module 2: Data Transformation and Pipelines

  • Automated pipelines with Delta Live Tables

    • Delta Live Tables components

    • Continuous vs triggered pipelines

    • Configuring Auto Loader

    • Querying pipeline events

    • End-to-end example of Delta Live

    • Vacuum and garbage collection

  • Orchestrating workloads with Databricks Jobs

    • Multi-task workflows and task dependencies

    • Viewing job history

    • Using dashboards

    • Handling failures and configuring retries

  • Unified data access with Unity Catalog

    • Catalogs vs metastores

    • Unity Catalog quickstart in Python

    • Applying object security

    • Best practices for catalogs, connections, and business units

Taught by

Noah Gift and Alfredo Deza

Reviews

Start your review of Data Engineering with Databricks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.