In recent years, autonomous driving technologies are developing so fast that we can expect in the near future, fleets of autonomous vehicles will be driving on public roads and providing services to passengers. And the introduction of autonomous vehicles would bring us numerous opportunities to improve the operations of transportation systems, since we can have better real-time control of these vehicles. This dissertation focuses on two representative operational problems for the transportation systems with autonomous vehicles: how to build more effective real-time routing services for these vehicles, and how to better manage the fleet of autonomous vehicles for taxi services.
Most previous works in addressing these two problems are focusing on developing parametric models to reflect the network dynamics and designing efficient algorithms to solve these models. Although there are some clear advantages of these parametric models, there are some major limitations when applying them to real networks. First, in order to allow for efficient solutions, strong assumptions or approximations have to be made in most of the parametric models. However, many of the assumptions can not be validated in reality. Second, some of the proposed parametric models still suffer from the curse of dimensionality, i.e. they can be applied to small networks but can not be incorporated into larger networks. With these considerations, the aim of this dissertation is to explore the use of non-parametric model-free methods for these two problems in transportation networks. More specifically, we incorporate the framework of Reinforcement Learning (RL), which is primarily concerned with how to obtain the optimal policy when the model of the Markov decision process (MDP) is not available.
We first study the problem of adaptive routing in stochastic and time-dependent networks. The problem is formulated as a MDP first, and we present the idea of Q learning to solve it in discrete state space, which is also compared with traditional stochastic dynamic programming method. Our case study on a mid-size network shows that Q learning outperforms the traditional dynamic programming method in most cases, especially during peak demand periods. Then in order to resolve the curse of dimensionality, we introduce an offline tree-based batch mode reinforcement learning method called fitted Q iteration (FQI), which can work in continuous state space and incorporate any regression algorithm as an approximation for the value function. The case study shows that it can further improve the routing policy and deliver a more flexible strategy during peak hours. Furthermore, the computational study shows that FQI can make more sufficient use of data at hand, since if there are more data fed into the training process of FQI, the resulting routing performance could be improved consistently.
Then we present a deep reinforcement learning approach for the problem of dispatching autonomous vehicles for tax services. We formulate the problem first and present our basic assumptions on the problem. Then we propose an actor-critic framework with neural networks as approximations for both the actor and critic functions, and with adaptations to the output action function. Furthermore, we provide a benchmark by formulating the problem as an integer program in order to maximize the total rewards. From our case study based on the New York yellow taxi data, we find that no matter the system dynamics are deterministic or stochastic, the RL method can always converge to some value close to the true optimal. Furthermore, when we choose to consider user priority and first serve those impatient passengers, we notice that the average waiting time of all passengers would sacrifice, but we add more fairness to the system by making sure there are less extreme long waiting times.