Visual Language Models for Edge AI 2.0 - Multi-image Reasoning and In-context Learning

Overview

Watch a 37-minute conference talk exploring cutting-edge innovations in edge AI technology, where Song Han presents groundbreaking developments in visual language models and model efficiency. Dive into VILA (CVPR'24), a sophisticated visual language model capable of multi-image reasoning and in-context learning, specifically designed for deployment on Jetson Orin Nano. Learn about AWQ (MLSys'24), an advanced 4-bit LLM quantization algorithm that enhances model efficiency, and discover TinyChat, an inference library powering visual language model inference. Understand how these three technologies combine to enable advanced visual reasoning capabilities on edge devices, opening new possibilities for edge AI applications.