Completed
Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 The Business Intelligence use case How BI tools connect to Databricks?
- 3 Data growth
- 4 Challenges and opportunities Breaking down the extract problem Problem
- 5 Fetching query results Result pagination
- 6 Importing tables Use internal compute engine
- 7 Serving results before Arrow Multiple layers of conversion
- 8 Serving results with Arrow Bring results faster to the client
- 9 Collecting results in Arrow format Tasks generate Arrow batches
- 10 Arrow batch sizing Fetching Arrow batches
- 11 Improvements with Arrow Speedups up less than 3x
- 12 Extract bottlenecks
- 13 New data extract architecture Cloud Fotch system design
- 14 Inlining small results Hybrid results
- 15 Data layout File sizing and pagination
- 16 Fetching results from URLS Parallel file downloads
- 17 Cloud Fetch performance Extract faster than BI tools can ingest
- 18 Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
- 19 Conclusions Scaled up extract workloads using cloud storage
- 20 DATA+AI SUMMIT 2022