Efficient and Cross-Platform LLM Inference in the Heterogeneous Cloud

Overview

Explore the challenges and solutions for running AI/LLM applications in heterogeneous cloud environments through this 33-minute conference talk by Michael Yuan from Second State. Learn about the limitations of traditional Linux containers for AI workloads and discover how cloud-native WebAssembly (Wasm) offers a portable bytecode format that abstracts away hardware differences. Gain insights into the W3C WASI-NN standard and its role in enabling cross-platform LLM applications. Understand how to develop LLM applications in Rust on a Macbook and deploy them seamlessly on Nvidia cloud servers or ARM NPU devices without recompilation. Explore the advantages of Wasm apps in container management using tools like Docker, Podman, and Kubernetes. Dive into WasmEdge's implementation of WASI-NN and its support for various AI/LLM applications. Acquire practical skills for building and running LLM applications across local, edge, and cloud devices using a single binary application.