Learn best practices, patterns, and processes for developers and DevOps teams who want to design and implement data processing using Azure Databricks.
Overview
Syllabus
Introduction
- Optimize data pipelines
- What you should know
- About using cloud services
- Meet Databricks Apache Spark clusters
- Business scenarios for Spark
- Understand Spark key components
- Azure Databricks concepts
- Quick start: Use a notebook
- Review Databricks Azure cluster setup
- Use a Python notebook with dashboards
- Use an R notebook
- Use a Scala notebook for visualization
- Use a notebook with scikit-learn
- Use a Spark Streaming notebook
- Use an external Scala library: variant-spark
- Understand data engineering workload steps
- Understand cluster configurations
- Understand Spark job execution overhead
- Explore optimization control planes
- Optimize a cluster and job
- Run a production-size job
- Use Databricks jobs and role-based control
- Use Databricks Runtime ML
- Understand ML Pipelines API
- Use ML Pipelines API
- Use distributed ML training
- Understand Databricks Delta
- Use Databricks Delta
- Use Azure Blob storage
- Understand MLflow
- Azure Databricks pipeline considerations
- Azure Databricks for data warehousing
- Azure Databricks and machine learning
- Azure Databricks for churn analysis
- Azure Databricks for intrusion detection
- Next steps
Taught by
Lynn Langit