Overview
Learn how to optimize Presto performance through distributed caching in this technical talk that addresses common challenges faced when working with cloud storage systems like S3. Explore solutions for slow query performance and high API costs through detailed explanations of distributed caching design patterns and real-world implementations. Discover advanced techniques including segmented data file caching, soft-affinity scheduler policies, cache filtering, TTL, and customized eviction strategies. Examine case studies from major technology companies like Meta, Uber, ByteDance, and Newsbreak to understand how they successfully implemented caching to optimize interactive queries, maximize hit rates, reduce cloud storage costs, and improve query performance. Master practical implementation strategies for setting up caching systems and measuring performance improvements using TPC-DS benchmark results.
Syllabus
Presto Optimization with Distributed Caching on Data Lake - Hope Wang & Beinan Wang, Alluxio
Taught by
Presto Foundation