Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Parameters

Overview

Watch a technical conference presentation from HOTI (Hot Interconnects Symposium) exploring an innovative network architecture called Rail-only, designed specifically for training large language models with trillion parameters. Learn about this cost-effective yet high-performance networking solution presented by researchers Weiyang Wang, Manya Ghobadi, Kayvon Shakeri, Ying Zhang and Naader Hasani as part of the Technical Paper Session on Networks for Large Language Models. Discover how this approach addresses the networking challenges in training massive AI models while maintaining efficiency and performance in this 32-minute talk chaired by AMD's Shelby Lockhart.

Syllabus

Day 1 09:00: Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Params

Taught by

HOTI - Hot Interconnects Symposium

Reviews

Start your review of Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Parameters

Taught by

Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for LLMs

Pretraining LLMs

Demystifying the Communication Characteristics for Distributed Transformer Models

Principles and Practice of Scalable and Distributed Deep Neural Networks Training and Inference

High Performance Networking for Distributed DL Training in Production K8s

Keeping Pace: Micro-Models, Large Language Models, and the Co-Evolution of Chip and Data Center Design in a Rapidly Shifting AI Landscape

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.