Building Serverless AI Apps with Spin and WebAssembly

Overview

Explore the world of serverless AI applications in this conference talk by Matt Butcher and Radu Matei from Fermyon. Discover how Spin, an open-source tool, enables the creation of scalable serverless WebAssembly apps. Learn about WebAssembly's platform neutrality and its ability to run on various OSes, CPU architectures, and GPUs. Follow along as the speakers build a simple AI inferencing app using the LLaMa2 Chat LLM, demonstrating local testing and deployment across different environments, including Docker Desktop and Kubernetes clusters with Wasm support. Gain insights into the performance characteristics of each environment and delve into the nuances of GPU scheduling in clustered environments. Understand how Spin's fine-grained GPU scheduling can enhance GPU utilization across multiple applications, providing valuable knowledge for developers interested in efficient AI app deployment and optimization.