Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Image-based Robot Pose Estimation

Abstract

Modern robotic automation often relies on cameras for rich sensory input to infer tasks and provide feedback for closed-loop control. Accurate robot pose estimation is critical for linking visual feedback to the robot’s operational space. Traditional camera-to-robot calibration methods are labor-intensive, typically requiring externally attached fiducial markers, collecting images of several robot configurations, and solving for transformations—limitations that hinder their use in dynamic or unstructured environments. This dissertation presents deep learning-based approaches for markerless robot pose estimation, aimed at eliminating cumbersome physical setups while enhancing calibration flexibility and accuracy. First, two methods are introduced: a keypoint-based approach and a rendering-based approach. The keypoint-based method employs a deep neural network for detecting robot keypoints, followed by a Perspective-n-Points solver to estimate the robot pose. In contrast, the rendering-based method uses binary robot masks as input, iteratively updating pose estimates through a differentiable rendering process that minimizes differences between rendered and observed data. Both methods achieve state-of-the-art performance in image-based robot pose estimation, and their capability for online pose tracking is demonstrated on surgical robot and snake robot when integrated with probabilistic filtering techniques. The dissertation further examines the strengths and limitations of both approaches and proposes a self-supervised training framework to leverage their complementary advantages. Finally, this work extends the pose estimation framework to scenarios where the robot is only partially visible by integrating a vision-language foundation model to evaluate the visibility of the robot's components. Consequently, this method enhances robot pose estimation across a broader range of real-world manipulation scenarios.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View