This thesis is concerned with designing efficient methods to incorporate known structure in machine learning models. Structure arises either from problem formulation (e.g. physical constraints, aggregation constraints), or desirable model properties (energy efficiency, sparsity, robustness). In many cases, the modeler has a certain knowledge about the system that they are modeling, which must be enforced in an exact manner. This can be necessary for providing adequate safety guarantees, or for improving the system's efficiency: training a system with less data, or less computation costs. This thesis provides methods to do so in a variety of settings, by building on the two foundational fields of continuous, constrained optimization and of differentiable statistical modeling (also known as deep learning).
The first part of the thesis is centered on designing and analyzing efficient algorithms for optimization problems with convex constraints. In particular, it focuses on two variants of the Frank-Wolfe algorithm: the first variant proposes a fast backtracking-line search algorithm to adaptively set the step size in the full-gradient setting; the second variant proposes a fast stochastic Frank-Wolfe algorithm for constrained finite-sum problems. I also describe contributions to open-source constrained optimization software. The second part of this thesis is concerned with designing deep learning models which enforce certain constraints exactly: constraints based on physics, and aggregation constraints for probabilistic forecasting models. This part leverages bi-level optimization models, and differentiable optimization to constrain the output of a complex neural network. We demonstrate that complex non-linear constraints can be enforced on complex non-convex models, including probabilistic models.
These examples showcase the power of hybrid models which couple data-driven learning and leverage complex nonlinear models such as deep neural networks, and well-studied optimization problems allowing for efficient algorithms. These hybrid models help highly flexible models to pick up structural patterns, achieving strong performance with sometimes no data access at all.