Completed
Quantization: Change model data type to int from float (reduces memory and computation) Knowledge Distillation: Train a smaller model based on the outputs of a bigger model (student/teacher) Pruning:…
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Deploying Optimized Deep Learning Pipelines
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Performance will not usually mean evaluation metrics Optimization also does not mean optimization algorithms such as Adam, Adagrad, NAdam... Bias and Generalization will also not be covered Performan…
- 3 Software engineers dealing with machine learning models Data Scientists needing to know how to train more performant models Developers generally curious about the harder problems of deploying larger …
- 4 Labeling and data quality Deploying models: Setting up a REST API Packaging: how to deploy your ML pipeline Experiment Tracking: Metrics, sharing results
- 5 Computer vision on: o Mobile Devices Single board computers (pis, jetson nano...) Big Servers with GPUS NLP on: Big Servers with GPUs Large CPU models
- 6 Data needs to be transformed before it can be used Fast transforms are usually an afterthought
- 7 ETL/Data Pipelines Primer • Raw data needs to be converted to arrays (think pandas data frame to numpy array) Data can come from anywhere: databases, the web (REST), streams (kafka, spark, flink...) …
- 8 Models Primer • Models are stored in various formats: hdfs (keras), protobuf (tensorflow.onnx), pickle (pytorch) • Model files are a mix of configuration and parameters (ndarrays that represent the w…
- 9 ML Pipelines are not just models • ETL varies and can be represented in json, code, or even within the model via something like tf.data • Metrics and experiments (evaluation results) may also be stor…
- 10 Better in memory file formats for data interchange
- 11 Removing redundancy matters: Identity ops, redundant layers... Model Size matters: less parameters and compute-faster, less storage • Format matters: Some execution engines (If lite vs tensorflow, to…
- 12 Quantization: Change model data type to int from float (reduces memory and computation) Knowledge Distillation: Train a smaller model based on the outputs of a bigger model (student/teacher) Pruning:…
- 13 Deep Learning Compilers: TVM, Glow, MLIR Compiles models to executable binaries Handles finding optimal graph for a given hardware configuration Note: Not ready for production use. Very early days ye…