Overview
Explore how Reddit revolutionized its internal ML Platform using Ray and KubeRay in this 35-minute conference talk from Ray Summit 2024. Gain insights into Reddit's custom tooling and workflows built around these technologies, which have significantly enhanced developer velocity and achieved a 6x reduction in model training time. Discover Reddit's approach to scaling ML training and serving workloads, including their extension of the inference platform using Ray Serve for scalable development, deployment, and serving of fine-tuned open-source LLMs. Learn valuable lessons for optimizing ML infrastructure in a high-scale, content-rich environment, and understand the real-world benefits of implementing Ray in large-scale machine learning operations.
Syllabus
Reddit's ML Evolution: Scaling with Ray and KubeRay | Ray Summit 2024
Taught by
Anyscale