Performance Troubleshooting Using Apache Spark Metrics - Databricks Talk

Performance Troubleshooting Using Apache Spark Metrics - Databricks Talk

Databricks via YouTube Direct link

Task Metrics in the Event Log

12 of 24

12 of 24

Task Metrics in the Event Log

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Performance Troubleshooting Using Apache Spark Metrics - Databricks Talk

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Data at the Large Hadron Collider
  3. 3 Analytics Platform @CERN
  4. 4 Hadoop and Spark Clusters at CERN
  5. 5 Performance Troubleshooting Goals
  6. 6 Performance Methodologies and Anti-Patterns Typical benchmark graph
  7. 7 Workload and Performance Data
  8. 8 Measuring Spark
  9. 9 Spark Instrumentation - Metrics
  10. 10 How to Gather Spark Task Metrics
  11. 11 Spark Metrics in REST API
  12. 12 Task Metrics in the Event Log
  13. 13 SparkMeasure - Getting Started
  14. 14 SparkMeasure, Usage Modes
  15. 15 Instrument Code with Spark Measure
  16. 16 Spark Metrics System • Spark is also instrumented using the Dropwizard/Codahale metrics library • Multiple sources (data providers)
  17. 17 Ingredients for a Spark Performance Dashboard
  18. 18 Assemble Dashboard Components
  19. 19 Spark Dashboard - Examples Graph: "number of active tasks" vs. time
  20. 20 Dashboard - Memory
  21. 21 Dashboard - Executor CPU Utilization Graph: "CPU utilization by executors' JVM" vs. time
  22. 22 Executor Plugins Extend Metrics • User-defined executor metrics, SPARK-28091, target Spark 3.0.0
  23. 23 Metrics from OS Monitoring
  24. 24 Data + Context = Insights

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.