Many mission-critical systems need to solve online decision-making problems such as workload scheduling in datacenters, power allocation in edge computing, battery management for EV charging, demand response in power systems, etc. Decision-making algorithms in these systems are expected to achieve a high expected reward while guaranteeing some important trustworthiness metrics such as robustness, safety, fairness, etc. Recently, machine learning (ML) for decision processes, utilizing available statistical information to achieve a high expected reward, has been attracting growing interests. However, ML algorithms usually suffer from the lack of trustworthiness guarantees, which hinders their deployments in real systems. On the other hand, domain expert algorithms have been programmed in many real systems for a long time and can be trusted in terms of some performance metrics, but they may not achieve a high enough expected reward. In this dissertation, given various decision processes, we design algorithms to exploit corresponding expert knowledge to achieve both high expected reward and provably-guaranteed worst-case performances.
Specifically, the dissertation includes learning-augmented algorithms and theory in the following aspects. First, the dissertation consider bandit decision making with imperfect context and proposes robust algorithms to maximize the worst-case reward minimize the worst-case regret, respectively. The simulations of the algorithms on online edge datacenter selection validate our theoretical analysis. Then, the dissertation considers online optimization/control problems with known dynamic models and proposes expert calibrated ML algorithms with provable guarantees for anytime competitiveness. The theoretical analysis highlights the tradeoff between any-time competitiveness and average performance. The empirical results on electric vehicle charging station management are used to demonstrate the performance. Furthermore, without the knowledge of dynamicmodels, the dissertation designs reinforcement learning algorithms to optimize the expected reward while guaranteeing the anytime cost constraints for any episode. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee for the proposed algorithm. Beyond that, the dissertation considers online decision
making with multiple budget constraints and proposes machine learning (ML) assisted unrolling approach which unrolls the online decision pipeline and leverages an ML model for updating the Lagrangian multiplier online. For efficient training via backpropagation, we derive gradients of the decision model. Finally, the dissertation gives a theoretical analysis on general domain knowledge informed learning, quantitatively demonstrating the two benefits of do- main knowledge in informed learning — regularizing the label-based supervision and supplementing the labeled samples.