Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Integrating Vision Language Navigation with Autonomous Driving in Unmapped, Dynamic, Off-Road Environments

Abstract

Autonomous robots capable of navigating complex, unstructured environments based on natural language commands represent a significant advancement in useability and capability, yet this functionality is currently underdeveloped in the AI and Robotics world. Traditional autonomous robots are typically limited to navigating pre-mapped areas with relatively simple environmental topologies (e.g.: vehicle roadways). This dissertation addresses the challenge of integrating Vision Language Navigation (VLN) with autonomous driving technologies, enhancing robots’ ability to comprehend and execute natural language instructions in diverse, unstructured environments, while also performing simultaneous mapping, navigating, and obstacle avoidance.

While traditional autonomous vehicles excel at on-road waypoint navigation in structured environments, they struggle to interpret and act on natural language commands. Additionally, research on autonomous driving in unmapped, dynamic environments remains limited, creating a gap in the utility of autonomous systems for tasks requiring complex, multi-step instructions or navigation in novel settings.

This research proposes a novel software stack that integrates VLN with autonomous driving technologies, including: SLAM (Simultaneous Localization and Mapping) for mapping and localization; dynamic obstacle detection and avoidance; and path planning and execution. The VLN agent interprets natural language commands and generates waypoints for the autonomous driving system, which then performs real-time navigation, mapping, and obstacle avoidance. The approach is validated in 2D and 3D environments, using simulated and real-world scenarios.

The proposed system demonstrates successful navigation and task execution across multiple settings. In both real and simulated settings, the autonomous driving system reached its goals in 100% of test cases despite dynamic obstacles. In uneven terrain, the agent effectively mapped and navigated through complex, obstacle-ridden environments, accurately tracking 6-DOF location and orientation. The integration of the VLN agent with the autonomous driving stack enabled real-time navigation and task completion based on natural language instructions.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View