Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Demystifying AI and ML Infrastructure for Network Engineers

Tech Field Day via YouTube

Overview

Learn how to build and manage GPU clusters for AI workloads in this technical presentation from Cisco at AI Field Day 5. Explore the challenges and solutions of setting up GPU clusters, including inter-GPU networking optimization using Cisco Nexus 9000 Series switches. Discover how to reduce cluster setup time from weeks to hours using validated solutions, and understand critical network designs like "Rails Optimized" and "Fly" for efficient GPU communication. Master concepts of collective communication protocols, dynamic load balancing, and static pinning for optimal data flow between GPUs. Gain insights into creating lossless networks using priority-based flow control and leveraging Nexus Dashboard for monitoring and anomaly detection. Follow along as a machine learning engineer demonstrates building a generative AI application using on-premises GPU infrastructure, showing how to process billions of tokens efficiently while maintaining data security. See real-world applications of AI/ML infrastructure in network engineering through practical examples of real-time insights and anomaly detection.

Syllabus

Demystifying Artificial Intelligence and Machine Learning Infrastructure for a Network Engineer

Taught by

Tech Field Day

Reviews

Start your review of Demystifying AI and ML Infrastructure for Network Engineers

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.