Reinforcement Learning (RL) has been extensively explored within the domainof building control, primarily because the problems in this field can be effectively
formulated as Markov Decision Process (MDP) problems. Traditional approaches
predominantly treat these challenges as online RL problems, assuming that accurate
simulators or environmental models are already established and fine-tuned.
However, creating and calibrating these models is not only time-intensive and
resource-heavy but also starting from a randomly initialized policy could pose
safety concerns.
Consequently, for addressing real-world issues, data-driven strategies emerge
as a more practical alternative for learning agents. This is particularly relevant
in contemporary building management systems, where control and actuation data
are systematically archived. Such data can serve as a valuable foundation for prior
knowledge and be stored as experience replays, enabling agents to learn and adapt
more effectively. Typically, a default building control policy is crafted by domain
experts leveraging their best-known practices. This expert policy can serve as the
expert demonstration, providing a behavioral guide that informs and enhances the
early performance of a learning agent, thereby minimizing opportunity costs.
Nevertheless, the policy learning of offline methods is limited due to the static dataset the agent learns from. No further exploration in the state-action spaces is
allowed. Thus, it is crucial to study the offline-to-online methods to further improve
the pre-trained offline models with online interaction. The major challenge of
offline-to-online methods is to overcome the extrapolation errors in value estimation
encountered during the distribution drift from the static experience replay to the
environments to be evaluated.
In this dissertation, we introduce studies encompassing a suite of data-driven
approaches in building control, beginning with offline/batch reinforcement learning.
Where we adapt the Kullback-Leibler divergence to penalize the policy updates
that deviate far from their previous selves. Also, the first open-source building
control dataset for batch reinforcement learning benchmark. A standardized
dataset is crucial for batch reinforcement learning, Then, we delve into a unified
policy regularization method that integrates existing policies within both online
and offline frameworks. It provides robustness and stability to reinforcement
learning. Finally, we extend our exploration into offline-to-online reinforcement
learning and address the challenge of adapting the distribution drift with adaptive
policy regularization to automatically tune the agent learning. Collectively, this
dissertation studies the policy regularization in model-free building control with
comprehensive approaches from offline to online reinforcement learning.