Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Linux Foundation

Enable Generative AI Everywhere with Ubiquitous Hardware and Open Software

Linux Foundation via YouTube

Overview

Explore optimization techniques for Generative AI and Large Language Models (LLMs) in this informative conference talk. Learn about strategies to reduce inference latency and improve performance, including low precision inference, Flash Attention, Efficient Attention in scaled dot product attention (SDPA), optimized KV cache access, and Kernel Fusion. Discover how these optimizations, implemented within PyTorch and Intel Extension for PyTorch, can significantly enhance model efficiency on CPU servers with 4th generation Intel Xeon Scalable Processors. Gain insights into scaling up and out model inference on multiple devices using Tensor Parallel techniques, enabling the deployment of generative AI across various hardware configurations.

Syllabus

Enable Generative AI Everywhere with Ubiquitous Hardware and Open Software - Guobing Chen, Intel

Taught by

Linux Foundation

Reviews

Start your review of Enable Generative AI Everywhere with Ubiquitous Hardware and Open Software

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.