GShard- Scaling Giant Models with Conditional Computation and Automatic Sharding

Yannic Kilcher via YouTube Direct link

- Backpropagation in Mixture-of-Experts

5

of 10

5 of 10

- Backpropagation in Mixture-of-Experts

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

GShard- Scaling Giant Models with Conditional Computation and Automatic Sharding