Scaling Deep Learning Model Training

Overview

Discover strategies for scaling deep learning model training in this 32-minute conference talk from the Toronto Machine Learning Series. Explore how distributed training can dramatically reduce training time for large datasets and models, potentially turning days-long jobs into hours-long tasks without significantly increasing costs. Learn about recent improvements in off-the-shelf tooling that have simplified the process of splitting model training across multiple instances in a cluster. Gain insights into the key design decisions to consider when scaling up, including the importance of tuning hyperparameters like learning rate and batch size. Benefit from the expertise of Douglas Sherk, a Senior Machine Learning Engineer at Axon Enterprise Inc, as he shares his experience in building end-to-end ML platforms and developing self-driving car interior monitoring solutions.