AutoMoE - Neural Architecture Search for Efficient Sparsely Activated Transformers

Overview

Learn about neural architecture search for efficient sparsely activated transformers in this 47-minute research seminar presentation. Explore how AutoMoE, a novel framework, introduces sparse architectures with conditional computation in the NAS search space, moving beyond traditional dense architectures where all network weights are activated for every input. Discover how AutoMoE-generated sparse models achieve 4x FLOPs reduction and equivalent CPU speedups compared to manually designed Transformers while maintaining BLEU score parity on neural machine translation benchmark datasets. Delve into the heterogeneous search space combining dense and sparsely activated Transformer modules, examining crucial aspects like expert quantity, placement, and sizing for adaptive computation. Access the complete implementation, including code, data, and trained models, through the provided GitHub repository and research paper.