Explore a 31-minute conference talk detailing DoorDash's journey to enhance machine learning model training using Ray. Discover how the company addressed challenges in scalability, costs, development velocity, and observability within their forecasting and training pipelines. Learn about the proof-of-concept implementation, benchmarking setup, and integration of Ray into the existing ML Platform architecture. Gain insights into working with open-source KubeRay and the future vision for DoorDash's ML Platform with Ray as a core component. Access the accompanying slide deck for a comprehensive overview of the presentation, which covers topics such as Project Lucent, Lucent Workflows, ARGILLE, and GPU stability.
Overview
Syllabus
Intro
Why Train ML Models
Ray Summit 2020
Project Lucent
Lucent Workflows
ARGIL
LEMS
GPU Stability
Closing Thoughts
Questions
Taught by
Anyscale