Presto Optimization with Distributed Caching on Data Lake

Overview

Learn how to optimize Presto performance through distributed caching in this technical talk that addresses common challenges faced when working with cloud storage systems like S3. Explore solutions for slow query performance and high API costs through detailed explanations of distributed caching design patterns and real-world implementations. Discover advanced techniques including segmented data file caching, soft-affinity scheduler policies, cache filtering, TTL, and customized eviction strategies. Examine case studies from major technology companies like Meta, Uber, ByteDance, and Newsbreak to understand how they successfully implemented caching to optimize interactive queries, maximize hit rates, reduce cloud storage costs, and improve query performance. Master practical implementation strategies for setting up caching systems and measuring performance improvements using TPC-DS benchmark results.

Syllabus

Presto Optimization with Distributed Caching on Data Lake - Hope Wang & Beinan Wang, Alluxio

Taught by

Presto Foundation

Reviews

Start your review of Presto Optimization with Distributed Caching on Data Lake

Taught by

Distributed Query Optimization and Security

Speeding Up Presto with Router and Local Cache Implementation

Presto 2.0 Native Engine: Performance and Deployment at Meta and IBM

Caching Framework for Exabyte-Scale Data Lakes

Maximizing Query Speed and Minimizing Costs in Data Lakes with Open-Source Caching

An Intro to Presto, the Open Source Distributed SQL Query Engine for the Data Lake

Never Stop Learning.