Accelerating Data Processing in Spark SQL with Pandas UDFs - Optimization Techniques

Accelerating Data Processing in Spark SQL with Pandas UDFs - Optimization Techniques

Databricks via YouTube Direct link

Naive approach: Use Spark SOL

7 of 12

7 of 12

Naive approach: Use Spark SOL

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Accelerating Data Processing in Spark SQL with Pandas UDFs - Optimization Techniques

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Optimization Tricks
  3. 3 What are Pandas UDFs?
  4. 4 Development tips and tricks
  5. 5 Modeling at Quantcast
  6. 6 Example Problem
  7. 7 Naive approach: Use Spark SOL
  8. 8 Optimization: Use Pandas UDFs for Looping
  9. 9 Optimization: Aggregate Keys in Batches
  10. 10 Optimization: Inverted Indexes
  11. 11 Optimization: Use python libraries
  12. 12 Optimization: Summary

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.