Mixture-of-Supernets - Improving Weight-Sharing Supernet Training with Architecture-Routed MoE

Overview

Learn about enhancing supernet architecture training through a 35-minute AutoML seminar that introduces the Mixture-of-Supernets formulation. Explore how Mixture-of-Experts (MoE) concepts generate flexible weights for subnetworks, leading to improved Neural Architecture Search (NAS) efficiency and high-quality architectures. Discover the practical applications in constructing efficient BERT and Machine Translation models while meeting user-defined constraints. Join speaker Ganesh Jawahar as he presents this ACL 2024 research that demonstrates significant improvements in retraining time and overall NAS effectiveness.

Syllabus

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed MoE

Taught by

AutoML Seminars

Reviews

Start your review of Mixture-of-Supernets - Improving Weight-Sharing Supernet Training with Architecture-Routed MoE

Taught by

Mixture of Experts (MoE) in Large Language Models - A Simple Guide

Understanding Snowflake Arctic 480B - A Mixture of Experts LLM Architecture

Stanford Seminar - Mixture of Experts Paradigm and the Switch Transformer

AutoMoE - Neural Architecture Search for Efficient Sparsely Activated Transformers

Pioneering a Hybrid SSM Transformer Architecture - Jamba Foundation Model

Mixture of Transformers for Multi-modal Foundation Models

10 Best Deep Learning Courses for 2024

Never Stop Learning.