Overview
Explore Meta's innovative testing framework for compute engines like Presto in this 16-minute conference talk from PEPR '24. Discover how privacy-safe, production-like synthetic data is utilized to detect regressions within the Meta Data Warehouse. Learn about the challenges and solutions implemented to operate this framework at scale, including key features of the synthetic data generation process such as differential privacy, expanded column schema support, and improved scalability. Gain insights into how Meta leverages this testing framework to increase test coverage, reduce the Presto release cycle, and prevent production regressions. Presented by Jiangnan Cheng and Eric Liu from Meta, this talk offers valuable knowledge for professionals interested in advanced testing methodologies for large-scale data systems.
Syllabus
PEPR '24 - Compute Engine Testing with Synthetic Data Generation
Taught by
USENIX