Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Databricks via YouTube Direct link

Extract bottlenecks

12 of 20

12 of 20

Extract bottlenecks

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

Cloud Fetch: High-Bandwidth Connectivity for BI Tools - Databricks

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 The Business Intelligence use case How BI tools connect to Databricks?
  3. 3 Data growth
  4. 4 Challenges and opportunities Breaking down the extract problem Problem
  5. 5 Fetching query results Result pagination
  6. 6 Importing tables Use internal compute engine
  7. 7 Serving results before Arrow Multiple layers of conversion
  8. 8 Serving results with Arrow Bring results faster to the client
  9. 9 Collecting results in Arrow format Tasks generate Arrow batches
  10. 10 Arrow batch sizing Fetching Arrow batches
  11. 11 Improvements with Arrow Speedups up less than 3x
  12. 12 Extract bottlenecks
  13. 13 New data extract architecture Cloud Fotch system design
  14. 14 Inlining small results Hybrid results
  15. 15 Data layout File sizing and pagination
  16. 16 Fetching results from URLS Parallel file downloads
  17. 17 Cloud Fetch performance Extract faster than BI tools can ingest
  18. 18 Cloud Fetch in the wild Outperforms direct fotch by an order of magnitude
  19. 19 Conclusions Scaled up extract workloads using cloud storage
  20. 20 DATA+AI SUMMIT 2022

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.