Let's Make Block Coordinate Descent Go Fast

Overview

Explore block coordinate descent methods for large-scale optimization problems in this 39-minute lecture by Mark Schmidt from the University of British Columbia. Delve into the advantages of coordinate descent, suitable problem types, and various block selection rules including Gauss-Southwell and greedy approaches. Examine Newton-steps, quadratic norms, and matrix updates, comparing them to traditional methods. Analyze experimental results for multi-class logistic regression and sparse quadratic problems. Investigate optimization with bound constraints, manifold identification properties, and superlinear convergence. Gain insights into message-passing techniques for sparse quadratics and understand the key takeaways for implementing fast iterative methods in optimization.

Syllabus

Intro
Why Block Coordinate Descent?
Block Coordinate Descent for Large-Scale Optimization
Why use coordinate descent?
Problems Suitable for Coordinate Descent
Cannonical Randomized BCD Algorithm
Better Block Selection Rules
Gauss-Southwell???
Fixed Blocks vs. Variable Blocks
Greedy Rules with Gradient Updates
Gauss-Southwell-Lipschitz vs. Maximum Improvement Rule
Newton-Steps and Quadratic-Norms
Gauss-Southwell-Quadratic Rule
Matrix vs. Newton Updates
Newton's Method vs. Cubic Regularization
Experiment: Multi-class Logistic Regression
Superlinear Convergence?
Optimization with Bound Constraints
Manifold Identification Property
Superlinear Convergence and Proximal-Newton
Message-Passing for Sparse Quadratics
Experiment: Sparse Quadratic Problem
Summary

Taught by

Simons Institute

Reviews

Start your review of Let's Make Block Coordinate Descent Go Fast

Taught by

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.