- Main
Stochastic Optimization for Machine Learning: Investigations on Bilevel Optimization and Large Learning Rates
Abstract
Stochastic optimization is fundamental to modern machine learning and deep learning problems. It provides various algorithmic frameworks, such as stochastic gradient descent (SGD), adaptive gradient algorithm (ADAGRAD) and adaptive moment estimation (ADAM), to efficiently minimize loss functions constructed from large-scale datasets. In this dissertation, we explore the theoretical properties and empirical performance of bilevel optimization algorithms and the phenomenon of large learning rates in machine learning. First, we introduce a novel algorithm, the Moving-Average Stochastic Bilevel Algorithm (MA-SOBA), designed for solving stochastic bilevel optimization under standard smoothness assumptions. Next, we extend the scope of bilevel optimization algorithms from single-agent training to a multi-agent context, i.e., distributed training, by proposing the Moving-Average Decentralized Stochastic Bilevel Optimization (MA-DSBO) algorithm. This approach improves the per-iteration complexity of previous methods, reducing the quadratic dependency on dimensionality to linear dependency. Lastly, inspired by the Edge of Stability (EoS) phenomenon observed in modern deep learning, we examine the training dynamics of gradient descent in a class of quadratic regression models with large learning rates—a scenario that classical optimization theory struggles to explain.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-