Pruning and Quantizing ML Models With One Shot Without Retraining

Overview

Discover advanced pruning and quantization techniques for machine learning models in this 53-minute video presentation by Neural Magic. Learn how to achieve significant model compression without retraining, using a one-shot approach that removes 60% of weights and quantizes the entire model to INT8 while maintaining 99% accuracy. Explore practical examples in Computer Vision and Natural Language Processing that demonstrate how these methods can lead to a 4X speedup in model performance. Gain valuable insights into implementing these time-efficient techniques, requiring only minutes of work, to enhance your current projects and research in the field of machine learning optimization.