Performance Troubleshooting Using Apache Spark Metrics - Databricks Talk

Performance Troubleshooting Using Apache Spark Metrics - Databricks Talk

Databricks via YouTube Direct link

Intro

1 of 24

1 of 24

Intro

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Performance Troubleshooting Using Apache Spark Metrics - Databricks Talk

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Data at the Large Hadron Collider
  3. 3 Analytics Platform @CERN
  4. 4 Hadoop and Spark Clusters at CERN
  5. 5 Performance Troubleshooting Goals
  6. 6 Performance Methodologies and Anti-Patterns Typical benchmark graph
  7. 7 Workload and Performance Data
  8. 8 Measuring Spark
  9. 9 Spark Instrumentation - Metrics
  10. 10 How to Gather Spark Task Metrics
  11. 11 Spark Metrics in REST API
  12. 12 Task Metrics in the Event Log
  13. 13 SparkMeasure - Getting Started
  14. 14 SparkMeasure, Usage Modes
  15. 15 Instrument Code with Spark Measure
  16. 16 Spark Metrics System • Spark is also instrumented using the Dropwizard/Codahale metrics library • Multiple sources (data providers)
  17. 17 Ingredients for a Spark Performance Dashboard
  18. 18 Assemble Dashboard Components
  19. 19 Spark Dashboard - Examples Graph: "number of active tasks" vs. time
  20. 20 Dashboard - Memory
  21. 21 Dashboard - Executor CPU Utilization Graph: "CPU utilization by executors' JVM" vs. time
  22. 22 Executor Plugins Extend Metrics • User-defined executor metrics, SPARK-28091, target Spark 3.0.0
  23. 23 Metrics from OS Monitoring
  24. 24 Data + Context = Insights

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.