Supporting Machine Learning Workloads in Presto - Optimizing Data Preparation and Processing

Overview

Learn how machine learning users can better leverage Presto for preparing large-scale training datasets in this 24-minute technical talk from the Presto Foundation. Explore key challenges faced when using Presto for ML workloads, drawing from real implementation experiences at Meta. Dive into three critical dimensions: efficient storage and in-memory data layout optimization, compressed execution's impact on operator design, and the concept of extreme late materialization. Discover recent advancements made by Meta's team in supporting ML workloads, examine initial performance results, and understand the ecosystem of open source projects supporting this technology stack. Gain insights into areas requiring further research, development and community collaboration to enhance Presto's capabilities for machine learning applications.