Quant-LLM: Accelerating Large Language Model Serving via FP6-Centric Algorithm-System Co-Design

Quant-LLM: Accelerating Large Language Model Serving via FP6-Centric Algorithm-System Co-Design

USENIX via YouTube Direct link

USENIX ATC '24 - Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric...

1 of 1

1 of 1

USENIX ATC '24 - Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric...

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Quant-LLM: Accelerating Large Language Model Serving via FP6-Centric Algorithm-System Co-Design

Automatically move to the next video in the Classroom when playback concludes

  1. 1 USENIX ATC '24 - Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric...

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.