Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Databricks via YouTube

Overview

Explore high-bandwidth connectivity with BI tools through Cloud Fetch in this 20-minute Databricks video. Learn how to overcome the data transfer bottleneck in traditional data warehouses when extracting large query results using Business Intelligence tools like Tableau and Microsoft Power BI. Discover the new parallel data fetching mechanism via cloud storage, such as AWS S3 and Azure Data Lake Storage, which can result in a 10x speed-up in extract performance. Dive into the challenges of data growth, the intricacies of result pagination, and the improvements made with Apache Arrow. Understand the new data extract architecture, including hybrid results, data layout considerations, and parallel file downloads. Gain insights into Cloud Fetch's real-world performance and its ability to scale up extract workloads using cloud storage, ultimately enabling faster data ingestion for BI tools.

Syllabus

Intro
The Business Intelligence use case How BI tools connect to Databricks?
Data growth
Challenges and opportunities Breaking down the extract problem Problem
Fetching query results Result pagination
Importing tables Use internal compute engine
Serving results before Arrow Multiple layers of conversion
Serving results with Arrow Bring results faster to the client
Collecting results in Arrow format Tasks generate Arrow batches
Arrow batch sizing Fetching Arrow batches
Improvements with Arrow Speedups up less than 3x
Extract bottlenecks
New data extract architecture Cloud Fotch system design
Inlining small results Hybrid results
Data layout File sizing and pagination
Fetching results from URLS Parallel file downloads
Cloud Fetch performance Extract faster than BI tools can ingest
Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
Conclusions Scaled up extract workloads using cloud storage
DATA+AI SUMMIT 2022

Taught by

Databricks

Reviews

Start your review of Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.