Running Llama Models with PyTorch and KleidiAI on Arm Servers - A Step-by-Step Tutorial
Arm Software Developers via YouTube
Overview
Learn to deploy and run Llama 3.1 and 3.2 language models in a hands-on tutorial that demonstrates implementation on Arm-based AWS instances. Master the process of requesting Meta Llama model access, setting up an Arm-based AWS EC2 instance, and installing necessary software components. Explore model quantization techniques for Llama 3.1, verify model functionality, and implement a web interface using Torchchat backend with Streamlit frontend. Discover how to adapt the implementation for Llama 3.2, with all steps accelerated by KleidiAI optimization. Follow comprehensive instructions for creating an interactive chatbot experience, complete with practical demonstrations of PyTorch integration and web-based user interface deployment.
Syllabus
Intro
Request access to the Meta Llama models
Create the Arm-based AWS EC2 instance
Update and install the required software
Download and quantize the Llama 3.1 model
Test that the model is working
Run the Torchchat backend and Streamlit frontend
Modify the Learning Path to run Llama 3.2
Outro
Taught by
Arm Software Developers