Completed
Deep Learning Pipeline for Physics Data
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Deep Learning Pipelines for High Energy Physics Using Apache Spark and Distributed Keras
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Experimental High Energy Physics is Data Intensive
- 3 Key Data Processing Challenge
- 4 Data Flow at LHC Experiments
- 5 R&D - Data Pipelines
- 6 Particle Classifiers Using Neural Networks
- 7 Deep Learning Pipeline for Physics Data
- 8 Analytics Platform at CERN
- 9 Hadoop and Spark Clusters at CERN
- 10 Step 1: Data Ingestion • Read input files: 4.5 TB from custom (ROOT) format
- 11 Feature Engineering
- 12 Step 2: Feature Preparation Features are converted to formats suitable for training
- 13 Performance and Lessons Learned • Data preparation is CPU bound
- 14 Neural Network Models and
- 15 Hyper-Parameter Tuning-DNN • Hyper-parameter tuning of the DNN model
- 16 Deep Learning at Scale with Spark
- 17 Spark, Analytics Zoo and BigDL
- 18 BigDL Run as Standard Spark Programs
- 19 BigDL Parameter Synchronization
- 20 Model Development - DNN for HLF • Model is instantiated using the Keras- compatible API provided by Analytics Zoo
- 21 Model Development - GRU + HLF A more complex network topology, combining a GRU of Low Level Feature + a DNN of High Level Features
- 22 Distributed Training
- 23 Performance and Scalability of Analytics Zoo/BigDL
- 24 Results - Model Performance
- 25 Workload Characterization
- 26 Training with TensorFlow 2.0 Training and test data
- 27 Recap: our Deep Learning Pipeline with Spark
- 28 Model Serving and Future Work
- 29 Summary • The use case developed addresses the needs for higher efficiency in event filtering at LHC experiments • Spark, Python notebooks
- 30 Labeled Data for Training and Test • Simulated events Software simulators are used to generate events