Completed
Models Primer • Models are stored in various formats: hdfs (keras), protobuf (tensorflow.onnx), pickle (pytorch) • Model files are a mix of configuration and parameters (ndarrays that represent the w…
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Deploying Optimized Deep Learning Pipelines
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Performance will not usually mean evaluation metrics Optimization also does not mean optimization algorithms such as Adam, Adagrad, NAdam... Bias and Generalization will also not be covered Performan…
- 3 Software engineers dealing with machine learning models Data Scientists needing to know how to train more performant models Developers generally curious about the harder problems of deploying larger …
- 4 Labeling and data quality Deploying models: Setting up a REST API Packaging: how to deploy your ML pipeline Experiment Tracking: Metrics, sharing results
- 5 Computer vision on: o Mobile Devices Single board computers (pis, jetson nano...) Big Servers with GPUS NLP on: Big Servers with GPUs Large CPU models
- 6 Data needs to be transformed before it can be used Fast transforms are usually an afterthought
- 7 ETL/Data Pipelines Primer • Raw data needs to be converted to arrays (think pandas data frame to numpy array) Data can come from anywhere: databases, the web (REST), streams (kafka, spark, flink...) …
- 8 Models Primer • Models are stored in various formats: hdfs (keras), protobuf (tensorflow.onnx), pickle (pytorch) • Model files are a mix of configuration and parameters (ndarrays that represent the w…
- 9 ML Pipelines are not just models • ETL varies and can be represented in json, code, or even within the model via something like tf.data • Metrics and experiments (evaluation results) may also be stor…
- 10 Better in memory file formats for data interchange
- 11 Removing redundancy matters: Identity ops, redundant layers... Model Size matters: less parameters and compute-faster, less storage • Format matters: Some execution engines (If lite vs tensorflow, to…
- 12 Quantization: Change model data type to int from float (reduces memory and computation) Knowledge Distillation: Train a smaller model based on the outputs of a bigger model (student/teacher) Pruning:…
- 13 Deep Learning Compilers: TVM, Glow, MLIR Compiles models to executable binaries Handles finding optimal graph for a given hardware configuration Note: Not ready for production use. Very early days ye…