Neural 3D Representations for View Synthesis with Sparse Input Views
Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Neural 3D Representations for View Synthesis with Sparse Input Views

Abstract

Reconstructing scenes or objects from observed images has long been a critical problem in the graphics and vision community.Traditional methods solve the inverse problem by having a large number of images to resolve the ambiguity in geometry and appearance. However, capturing and storing complex data consumes extensive compute resources, and it is infeasible for consumer-grade hardware. This dissertation presents several algorithms to reconstruct the 3D geometry and appearance from a handful of input views, allowing efficient data capture, storage and generalization to unseen scenes.

Starting with scene reconstruction, we first aim at data captured by 360° cameras.We introduce multi depth panoramas, a compact representation to enable translational and rotational movements in the 3D scene. We leverage multi-view stereo (MVS) techniques and deep neural networks to promote 16 input views into a panoramic representation that could efficiently render convincing visual results with a small storage requirement.

Furthermore, we explore a harder problem by reducing the input view count to only 2 and capturing scenes with dynamic components.We present deep 3D mask volume, a novel representation to ensure temporally stable renderings for view extrapolation. Our network takes information from the video frames to infer disocclusion caused by the moving objects. Then it produces a 3D mask volume to clean up the disoccluded regions with the temporally stable background content, producing flicker-free visual results.

Next, we focus on human portraits and seek to change the viewpoint and the lighting at the same time.We develop neural light-transport field (NeLF). This representation is trained on synthetic human portraits to generate novel views under novel lighting from only 5 input images.

Finally, we investigate the 3D reconstruction problem where only a single image is given.To this end, we present VisionNeRF. This algorithm combines the expressiveness and capacity from vision transformers and the high-fidelity rendering from volumetric representation to synthesize unseen views of a given object.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View