With the advanced wireless technologies and emerging machine learning techniques, there is an increasing trend of using both wireless communication and machine learning along with new platooning control as well as traffic signal control (TSC) algorithms to leverage and accommodate connected and autonomous vehicles (CAVs) for achieving high throughput and low latency traffic flow, lowering accidents, and reducing emissions in urban transportation. However, this development leads to highly connected wireless environments that would increase the potential for cyber-attacks on TSC. These attacks can undermine the benefits of new TSC algorithms. For example, an adversary can first perform reconnaissance to gain understanding of how the TSC operates under different traffic arrival rates, and then, choose a fixed number of vehicles to be attack vehicles to get scheduling priority and/or to create traffic congestion in one intersection which can spread to the entire network. These attacks can compromise TSC systems and significantly increase traffic delay and make TSC completely ineffective. Furthermore, machine learning based TSC approaches also lead to several issues: complexity, dimensionality, and convergence. They require a long period of time and a significant amount of training data to gain insights into improvement of traffic throughput and latency. They are not applicable to problems with a large state size. In addition, some of the approaches adopting neural networks (NNs) as a function approximator leads to a convergence issue due to the non-linearity of NNs. In this work, we have proposed the SecureTLC to address the security and machine learning issues caused by the novel technologies for modern intelligent transportation systems (ITS).
As our first work, we have investigated security vulnerabilities in four different backpressure-based (BP-based) traffic control algorithms: 1) Delay-based BackPressure Control (DBPC), 2) Queue-based BackPressure Control (QBPC), 3) Sum-of-delay-based BackPressure Control (SBPC), and 4) Hybrid-based BackPressure Control (HBPC) which is a combination of the delay-based and queue-based. Their performance has been compared when they are under two misinformation attacks: 1) time spoofing attack and 2) ghost vehicle attack. The time spoofing attack is a type of falsified data attack in which attack vehicles arriving at an intersection alter their arrival times. In the ghost vehicle attack, attack vehicles intentionally disconnect the wireless communication and thereby hide from the TSC. We have shown that the misinformation sent by attack vehicles can influence the signal phases determined by BP-based traffic control algorithms. We have considered an adversary that determines a set of arriving vehicles to be attack vehicles from many candidate sets (attack strategies) in order to maximize the number of disrupted signal phases. We have shown that by formulating the problem as a 0/1 Knapsack problem, the adversary can explore the space of attack strategies and determine the optimal strategy that maximally compromises the performance in terms of average delay and fairness. Through detailed simulation analyses we have also shown that while the DBPC has better fairness, it is more vulnerable to time spoofing attacks than the other schemes. On the other hand, the QBPC, which determines traffic signal phases using the aggregate queue information, suffers from a higher intensity of ghost vehicle attacks. We have examined the drawbacks of both the DBPC and QBPC under different attack scenarios for different traffic patterns including homogeneous and non-homogeneous vehicle arrivals at an isolated intersection. We have proposed two protection mechanisms against the attacks, namely, an auction-based protection algorithm (APA) and a hybrid-based protection algorithm (HPA). Using simulation analysis, we have shown that both protection schemes are able to mitigate the impacts of the time spoofing and ghost vehicle attacks.
In our second work, we have studied the impact of AI-enabled platooning control on fuel consumption. Nowadays, vehicular emissions and traffic congestion have been deteriorated by highly urbanization. The worsen traffic burdens drivers with a higher cost and longer time on driving, and exposures pedestrians to unhealthy emissions such as PM, $NO_{x}$, $SO_{2}$ and greenhouse gases. In response to these issues, CAV which enables information sharing between drivers and infrastructure was proposed. With advanced wireless technologies offering extremely low latency and CAV, platooning control can be realized to improve traffic efficiency and safety. However, conventional platooning control algorithms require complex computations and hence, are not a perfect candidate when applying to real-time operations. To overcome this issue, this chapter focuses on designing an innovative learning framework for platooning control capable of reducing the fuel consumption by the four basic platoon manipulations, e.g., split, acceleration, deceleration, and no-op. We integrated reinforcement learning (RL) with NNs to be able to model non-linear relationships between inputs and outputs for a complex application. The experimental results reveal a decreasing trend of the fuel usage and a growing trend of the reward. They demonstrate that the proposed DRL platooning control optimizes the fuel consumption by fine-tuning speeds and sizes of platoons.
In our third work, we have applied advanced machine learning techniques to TSC. A recent report has illustrated that an urban road network equipped with an advanced TSC system is capable of reducing average sojourn time of vehicles, traffic collisions, and traffic delay while at the same time decreasing energy consumption and improving parking. Modern TSC systems which leverage advanced machine learning techniques and scheduling algorithms have been a promising topic in recent years. RL has been broadly adopted in many areas, owing to its flexibility in combining an assortment of architectures and neural networks that assists RL agents in learning from a complex environment. Due to its compatibility with numerous existing deep learning techniques, RL can be integrated into a complicated deep learning framework which is extendable and powerful for many deterministic and nondeterministic applications. We proposed a deep RL (DRL) framework which has high extendability and flexibility by incorporating various RL-based algorithms and techniques. We shown that learning performance can be significantly improved by applying an on-policy temporal-difference (TD) learning method, SARSA, to the traffic signal control problem for a traffic network consisting of multiple intersections. The proposed traffic flow maps (TFMs) as input states are formulated by traffic flows at every time slot in the network. These states are fed into a non-linear function approximator to output the optimal action. In order to process TFMs, we combined a convolutional neural network, a dueling network architecture which enhances evaluations of action values when there is a huge action space, and a memory replay that breaks the strong temporal correlations among states. Moreover, we compared learning performance among DQN, 3DQN, DSARSA, and 2DSARSA as well as provide a thorough analysis on their weaknesses and strengths. To the best of our knowledge, this is the first work discussing and evaluating learning effects of different DRL-based agents.
In our fourth work, as an extended work, we have proposed a novel architecture exploiting both the global and local information to further improve the performance of TSC in the third work. We use the proposed TFMs as input states to represent the global information for a DRL agent and pressure-of-the-lanes (POL) to represent the local information as pressure metric for a number of backpressure (BP) controllers. We define a combinatorial reward function using the power metric which maximizes the overall throughput and minimizes the average end-to-end delay of a network. We proposed a hybrid-based two-level control (TLC) architecture which has high extendability and flexibility by enabling a collaboration between the centralized DRL agent using TFM and the decentralized BP controllers using POL in a hierarchical architecture to determine traffic signal phases that optimizes the network throughput and latency. We show that learning performance can be significantly improved by the collaboration between the two levels for a traffic network of multiple intersections. Moreover, we compare learning performance of the proposed TLC with BP, DQN, 3DQN, DSARSA, and 2DSARSA as well as provide a thorough analysis on their weaknesses and strengths.
As our future work, first, we plan to analyze the impacts of malicious attacks on TSC equipped with the DRL-based agent. From our experiments, we have shown that DRL performs well for a traffic network with multiple intersections. We expect to explore the tolerance of DRL-based agent when it is under adversarial attacks. How many attacks can it tolerate? We want to examine the resilience of DRL-based agent as well. How well can it recover after encountering attacks? Second, as we start to consider TSC problems for larger areas, increases in dimensions of states and actions are foreseeable. High dimensional states and actions causing difficulty in learning will be a worthy research topic. We plan to design a two-level control architecture which is capable of reducing the dimension of actions by transferring some workload from a centralized agent to local controllers.