Explore a conference talk that delves into BGL, a distributed GNN training system designed to optimize GPU efficiency for large-scale graph data processing. Learn about innovative techniques for minimizing feature retrieval traffic, improving graph partition algorithms, and reducing cross-partition communication during subgraph sampling. Discover how BGL addresses bottlenecks in preparing data for GPUs, focusing on subgraph sampling and feature retrieval. Gain insights into the dynamic cache engine, co-designed caching policy, and sampling order that achieve a balance between low overhead and high cache hit ratio. Understand the importance of resource isolation in reducing contention between data preprocessing stages. Examine the system's performance improvements over existing GNN training systems, demonstrated through extensive experiments on various GNN models and large graph datasets.
Overview
Syllabus
NSDI '23 - BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing
Taught by
USENIX