Extremely Low-Bit Quantization for Transformers - tinyML Asia 2021

Extremely Low-Bit Quantization for Transformers - tinyML Asia 2021

tinyML via YouTube Direct link

Latency Improvements

18 of 21

18 of 21

Latency Improvements

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Extremely Low-Bit Quantization for Transformers - tinyML Asia 2021

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Introduction
  2. 2 Computing system design
  3. 3 Transformer architecture
  4. 4 Uniform quantization
  5. 5 Uniform quantization scheme
  6. 6 Uniform continuation limits
  7. 7 Is it still useful
  8. 8 BCQ
  9. 9 Example
  10. 10 Critical problems
  11. 11 Lookup table
  12. 12 Transformer structure
  13. 13 Quantizing embedding layers
  14. 14 Mixed precision quantization
  15. 15 Encoder and Decoder
  16. 16 Retraining
  17. 17 Quantitation Results
  18. 18 Latency Improvements
  19. 19 Quantization
  20. 20 Q A
  21. 21 Strategic Partners

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.