Overview
Explore how to develop scalable data solutions using Azure Databricks in this comprehensive conference talk. Learn about connecting to various Azure data sources, processing large datasets with Apache Spark, and creating machine learning solutions. Discover the collaborative workspace features, integration with Power BI for visualization, and performance optimization techniques. Follow step-by-step demonstrations covering Databricks implementation, including resource creation, workspace navigation, cluster management, data analysis, job scheduling, and structured streaming. Gain insights into Databricks' history, pricing, and integration with tools like GitHub. Understand the benefits of using Databricks for processing data lakes, implementing machine learning workflows, and leveraging deep learning capabilities. By the end of this talk, acquire the knowledge needed to effectively utilize Azure Databricks for developing scalable and efficient data solutions.
Syllabus
Introduction
History of Databricks
Spark
Machine Learning
Machine Learning Monitoring
Analytics Tools
Managed Service
Processing
GitHub Integration
Managing Databricks
Pricing
Machine Learning Process
Databricks
Creating a Databricks resource
Databricks workspace
Creating a new cluster
Analyzing data
Using Databricks
Databricks Job
Databricks Notifications
Structured Streaming
Deep Learning
Jobs
Recap
Questions
Taught by
PASS Data Community Summit