Overview
Syllabus
Intro
Parametrised functions Supervised learning Gradient descent
Choose parameters to fit data
Algorithmically choose parameters
Calculate the derivative of the loss function with respect to the parameters
Calculate gradient for current parameters
Deep learning is supervised learning of parameterised functions by gradient descent
Tensor multiplication and non-linearity
Tensors: multidimensional arrays
Conditionals and loops
Algorithms for calculating gradients
Composition of Derivatives
Symbolic differentiation
No loops or conditionals
Inexact. Choosing e is hard
Two main algorithms: forward and reverse
Calculate with dual numbers
There is a monad for Dual
Forward-mode scales in the size of the input dimension
Chain rule doesn't care about order
Use a monad (or continuations)
flatMap is the chain rule
Reverse-mode scales in the size of the output dimension
Tensor dimensions must agree
Solution: expressive type systems
Linear logic and differential A-calculus
Need compilation (to GPU) for performance
Scala is well positioned for the next generation of DL systems
Taught by
Scala Days Conferences