- Main
Open-World 3D Understanding and Generation
- Liu, Minghua
- Advisor(s): Su, Hao
Abstract
3D representations model our physical world in one of the most explicit and structured ways, enabling the storage of extensive attributes. Understanding these 3D representations—including, but not limited to, their geometry, appearance, structure, semantics, mobility, functionality, and affordances—is crucial for developing intelligent agents that can comprehend and interact with our 3D physical environment and seamlessly integrate into human settings. Additionally, high-quality 3D generation allows us to replicate our 3D world, creating digital twins and supporting a wide range of downstream applications. Significant advancements have already been made in 3D deep learning by exploring suitable representations and neural algorithms for 3D data. However, unlike many other modalities, the scale of publicly available 3D data has been quite limited. Most previous 3D deep learning approaches have traditionally been confined to a narrow range of common categories (such as chairs, cars, airplanes, etc.), which greatly hinders their application to real-world scenarios with far more diverse categories and variations. Nevertheless, in the past two years, with the rapid development of large-scale pretrained models from 2D vision, language, and other modalities, as well as the emergence of larger and more diverse public 3D datasets, many new opportunities for 3D deep learning have arisen.
In this dissertation, we explore how to leverage extensive priors from other modalities, as well as how to exploit the limited but valuable 3D data to enhance the generalizability of various 3D deep learning tasks. We primarily focus on two families of tasks: 3D object understanding and 3D object generation in an open-world context. For each family, I explore several strategies to effectively utilize these priors and identify a series of crucial problems with proposed solutions. My efforts have significantly improved the generalizability of many previous 3D understanding and generation tasks, bridging the gap between the capabilities of earlier 'chair research' and the complex, diverse open-world scenarios in the real physical 3D world.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-