Deep Learning Pipelines for High Energy Physics Using Apache Spark and Distributed Keras

Deep Learning Pipelines for High Energy Physics Using Apache Spark and Distributed Keras

Databricks via YouTube Direct link

Hadoop and Spark Clusters at CERN

9 of 30

9 of 30

Hadoop and Spark Clusters at CERN

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Deep Learning Pipelines for High Energy Physics Using Apache Spark and Distributed Keras

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Experimental High Energy Physics is Data Intensive
  3. 3 Key Data Processing Challenge
  4. 4 Data Flow at LHC Experiments
  5. 5 R&D - Data Pipelines
  6. 6 Particle Classifiers Using Neural Networks
  7. 7 Deep Learning Pipeline for Physics Data
  8. 8 Analytics Platform at CERN
  9. 9 Hadoop and Spark Clusters at CERN
  10. 10 Step 1: Data Ingestion • Read input files: 4.5 TB from custom (ROOT) format
  11. 11 Feature Engineering
  12. 12 Step 2: Feature Preparation Features are converted to formats suitable for training
  13. 13 Performance and Lessons Learned • Data preparation is CPU bound
  14. 14 Neural Network Models and
  15. 15 Hyper-Parameter Tuning-DNN • Hyper-parameter tuning of the DNN model
  16. 16 Deep Learning at Scale with Spark
  17. 17 Spark, Analytics Zoo and BigDL
  18. 18 BigDL Run as Standard Spark Programs
  19. 19 BigDL Parameter Synchronization
  20. 20 Model Development - DNN for HLF • Model is instantiated using the Keras- compatible API provided by Analytics Zoo
  21. 21 Model Development - GRU + HLF A more complex network topology, combining a GRU of Low Level Feature + a DNN of High Level Features
  22. 22 Distributed Training
  23. 23 Performance and Scalability of Analytics Zoo/BigDL
  24. 24 Results - Model Performance
  25. 25 Workload Characterization
  26. 26 Training with TensorFlow 2.0 Training and test data
  27. 27 Recap: our Deep Learning Pipeline with Spark
  28. 28 Model Serving and Future Work
  29. 29 Summary • The use case developed addresses the needs for higher efficiency in event filtering at LHC experiments • Spark, Python notebooks
  30. 30 Labeled Data for Training and Test • Simulated events Software simulators are used to generate events

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.