Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Efficient Inference of Extremely Large Transformer Models

Toronto Machine Learning Series (TMLS) via YouTube

Overview

Explore the challenges and solutions for efficient inference of massive transformer-based language models in this 28-minute Toronto Machine Learning Series (TMLS) talk. Dive into the world of multi-billion-parameter models and learn how they are optimized for production environments. Discover key techniques for making these behemoth models faster, smaller, and more cost-effective, including model compression, efficient attention mechanisms, and optimal model parallelism strategies. Gain insights from Bharat Venkitesh, Senior Machine Learning Engineer at Cohere, as he discusses the establishment of the inference tech stack and the latest advancements in handling extremely large transformer models.

Syllabus

Efficient Inference of Extremely Large Transformer Models

Taught by

Toronto Machine Learning Series (TMLS)

Reviews

Start your review of Efficient Inference of Extremely Large Transformer Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.