Large Scale Machine Learning and Convex Optimization - Lecture 3
Hausdorff Center for Mathematics via YouTube
Overview
Syllabus
Intro
Main motivating examples
Subgradient method/descent (Shor et al., 1985)
Subgradient descent for machine learning Assumptions is the expected risk, the empirical risk
Summary: minimizing convex functions
Relationship to online learning
Stochastic subgradient "descent" /method
Convex stochastic approximation Existing work • Known global minimax rates of convergence for non-smooth problems (Nemirovsky and Yudin, 1983; Agarwal et al., 2012)
Robustness to wrong constants for = Cn
Robustness to lack of strong convexity
Beyond stochastic gradient method
Outline
Adaptive algorithm for logistic regression
Self-concordance
Least-mean-square algorithm
Markov chain interpretation of constant step sizes
Least-squares - Proof technique
Simulations - synthetic examples
Taught by
Hausdorff Center for Mathematics