Neural Nets for NLP 2020 - Efficiency Tricks for Neural Nets

Neural Nets for NLP 2020 - Efficiency Tricks for Neural Nets

Graham Neubig via YouTube Direct link

Operation-wise Parallelism

7 of 15

7 of 15

Operation-wise Parallelism

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

Neural Nets for NLP 2020 - Efficiency Tricks for Neural Nets

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Glamorous Life of an Al Scientist
  2. 2 A Simple Example • How long does a matrix-matrix multiply take?
  3. 3 Practically
  4. 4 What About Memory?
  5. 5 Three Types of Parallelism
  6. 6 Within-operation Parallelism
  7. 7 Operation-wise Parallelism
  8. 8 Example-wise Parallelism
  9. 9 Implementing Data Parallelism • Many modern libraries make data parallelism relatively easy, eg PyTorch DistributedDataParallel
  10. 10 Computation Across Large Vocabularies
  11. 11 Noise Contrastive Estimation (Mnih & Teh 2012)
  12. 12 Mini-batch Based Negative Sampling
  13. 13 Class-based Softmax (Goodman 2001) • Assign each word to a class • Predict class first, then word given class
  14. 14 Binary Code Prediction (Dietterich and Bakiri 1995, Oda et al. 2017)
  15. 15 Two Improvement to Binary Code Prediction

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.