McAleer, Stephen

Sequential Decision Making in Single-Agent and Multi-Agent Domains

2022

McAleer, Stephen
Advisor(s): Baldi, Pierre

Abstract

In this dissertation I outline three contributions. The first is in the area of single-agent planning. In this section I describe my work on combining deep reinforcement learning with search to solve hard planning problems such as the Rubik’s cube. In the next two sections I focus on contributions in the area of two-player zero-sum games. Both contributions are improvements to an existing double oracle algorithm called Policy Space Response Oracles (PSRO). The first improves PSRO by parallelizing PSRO while maintaining convergence guarantees. The resulting method, called Pipeline PSRO (P2SRO) achieves state-of-the-art performance on Barrage Stratego. The second contribution, called Extensive-Form Double Oracle (XDO) improves PSRO by extending it to extensive-form games. As a result, XDO is able to converge to a Nash equilibrium in a number of iterations linear in the number of infostates of the game, as opposed to a potentially exponential number of iterations necessary to insure convergence for PSRO. A neural version of XDO achieves state-of-the-art performance on a continuous-action game.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Irvine

Sequential Decision Making in Single-Agent and Multi-Agent Domains