Course information | Stochastic Gradient Techniques for Optimization and Learning - M2 - 6EC

Stochastic Gradient Techniques for Optimization and Learning - M2 - 6EC

Abstract
This course provides a thorough treatment of (stochastic) optimization via Stochastic Approximation (SA). SA is the umbrella term for methods that find solutions to optimization problems in an iterative way, where the algorithms are driven by either simulation experiments or observed data. The course summarizes the key results of SA (including settings with bias and noise). Proofs of the main results will be given, and many illustrating examples are provided. Also, the statistical output analysis of SA algorithms is covered in this course. The main methods for simulation-based optimization and learning we will work on within this course are the Simultaneous Perturbation Stochastic Approximation and the Gaussian Smoothed Function Approach, which are numerically highly efficient pseudo-gradient methods.

Full Description
Adaptive algorithms are now utilized across varied applications in simulation/data-driven learning and optimization. These algorithms aim to estimate recursively an unknown time-invariant (or slowly varying) parameter vector, traditionally denoted by θ, from measurements whose statistics depend on it. A well-known example is the gradient descent method which is one of the main techniques in AI and machine learning for supervised and unsupervised learning. Such algorithms are subsumed under the name stochastic approximation (SA).
The focus of this course is on SA algorithms and their application to stochastic simulation/data-based optimization and learning algorithms. The ODE approach to iterative learning and optimization algorithms will be introduced. In this course, we will discuss the projection method and algorithms with biased gradient estimators. Next to discussing various SA algorithms, this course provides the theoretical insights needed for carrying out a proper statistical output analysis of SA-type algorithms. Optimal computer budget allocation in SA will be taught as well (batching updates is often not advisable). Applications will stem from a wide range of stochastic models.
The course is of interest to students in the area of Simulation Analytics, Computer Science, Operations Research, and Machine Learning. It offers a self-contained course on simulation-based techniques in optimization and (machine) learning. Participating students have majors in disciplines of applied mathematics, computer science, data science, operations research, physics, electrical engineering, and economics.

Prerequisites
Basic knowledge of probability theory, stochastic processes, and Monte Carlo simulation (e.g., Chapters 1,2,5,6,7,8,9 of Averill Law: Simulation Modeling and Analysis, Mc Graw Hill 4-th or 5-th ed.; Chapter 10 of Cassandras and Lafortune: Introduction to Discrete Event Systems, Springer, 2nd ed, 2008), and optimization (e.g., Chapters 1 to 3 of Brinkhuis & Tikhomirov: Optimization, Insights and Applications, Princeton University Press, 2005).

Aim of the course
In this course, the student will learn how to apply gradient estimators in stochastic simulation/data-based optimization and learning algorithms. After successful participation in this course, the student will be able to conduct a simulation-based stochastic optimization solution to real-life problems.

Lecturers
Bernd Heidergott (VU) and Ad Ridder (VU)

Docent: Bernd Heidergott
Docent: Ad Ridder