Explore a 57-minute technical presentation from Meta engineers detailing advanced debugging methodologies for flash storage issues in hyperscale environments. Dive into the implementation of tracewatch tools combined with Latency Monitoring log pages to enable targeted trace collection using BPF triggers. Learn about the retrace tool's capabilities for analyzing captures across multiple formats and tracking I/O operations from application layer to drive level. Discover Meta's dialog collection mechanism for file system logging, including sanitization processes and industry collaboration efforts for implementing efficient logging in flash drives. Master techniques for debugging complex application-level issues in production environments, understanding flash reliability at scale, and identifying common flash issues encountered in hyperscale operations. Presented by Meta engineers Vineet Parekh and Venkat Ramesh, gain practical insights into improving debuggability of failures in datacenters while maintaining privacy requirements for sensitive components like SSDs.
Overview
Syllabus
SDC2022 – Debugging of Flash Issues Observed in Hyperscale Environment at Scale
Taught by
SNIAVideo