Scaling up Terascale Deep Learning on Commodity CPUs with ThirdAI and Ray

Overview

Explore the innovative approach to scaling terascale deep learning on commodity CPUs using ThirdAI and Ray in this 26-minute conference talk. Discover how ThirdAI, an early-stage startup, aims to democratize AI through algorithmic and software innovations. Learn about their proprietary BOLT engine, a deep learning framework built with sparsity as a core principle, enabling efficient model training on CPU hardware. Examine how ThirdAI's sparse deep learning models can outperform dense architectures on GPUs in certain tasks. Delve into the new distributed data parallel engine powered by Ray Core, which allows scaling of ThirdAI models to terabyte-scale datasets and billion-parameter models. Understand the key features of this industry-grade distributed training solution, including fault-tolerance, multiple communication modes, and seamless scalability. Investigate the unique scientific challenges of distributed deep learning training on CPUs, particularly the communication bottleneck addressed through novel gradient compression techniques. Review the results of distributed BOLT's evaluation on the terabyte-sized Criteo dataset, showcasing near-linear scaling up to 200 nodes and training times 42x faster than TensorFlow-CPU while using only one-sixth of computing resources.