Neural Lumigraph Rendering | CVPR 2021

Petr Kellnhofer, Lars Jebe, Andrew Jones, Ryan Spicer, Kari Pulli, Gordon Wetzstein
Raxium, Stanford University

A real-time neural rendering approach.

ABSTRACT

Novel view synthesis is a challenging and ill-posed inverse rendering problem. Neural rendering techniques have recently achieved photorealistic image quality for this task. State-of-the-art (SOTA) neural volume rendering approaches, however, are slow to train and require minutes of inference (i.e., rendering) time for high image resolutions. We adopt high-capacity neural scene representations with periodic activations for jointly optimizing an implicit surface and a radiance field of a scene supervised exclusively with posed 2D images. Our neural rendering pipeline accelerates SOTA neural volume rendering by about two orders of magnitude and our implicit surface representation is unique in allowing us to export a mesh with view-dependent texture information. Thus, like other implicit surface representations, ours is compatible with traditional graphics pipelines, enabling real-time rendering rates, while achieving unprecedented image quality compared to other surface methods. We assess the quality of our approach using existing datasets as well as high-quality 3D face data captured with a custom multi-camera rig.

FILES

CITATION

P. Kellnhofer, L. Jebe, A. Jones, R. Spicer, K. Pulli, G. Wetzstein, Neural Lumigraph Rendering, CVPR 2021 (oral)

@inproceedings{Kellnhofer:2021:nlr,
author = {Petr Kellnhofer and Lars Jebe and Andrew Jones and Ryan Spicer and Kari Pulli and Gordon Wetzstein},
title = {Neural Lumigraph Rendering},
booktitle = {CVPR},
year={2021}
}

Overview of our framework. Given a set of multi-view images, we optimize representation networks modeling shape and appearance of a scene end to end using a differentiable sphere tracer. The resulting models can be exported to enable view-dependent real-time rendering using traditional graphics pipelines.

Our surface-based neural scene representation (right) achieves a quality that is comparable to or better than neural volumes (left) and NeRF (center) for the sparse set of input cameras used in this experiment.


We qualitatively and qualitatively compare both offline sphere-traced (ST, bottom center) and real-time rasterized (RAS, bottom right) versions of our neural rendering framework to a number of alternative approaches, including COLMAP, neural volumes, NeRF, and IDR. Our framework achieves a quality comparable to the best of these methods while offering real-time rendering rates.


Compared to other methods that also estimate a proxy shape of the scene, such as COLMAP and IDR, our results produce the sharpest images with the best quality. Yet, the proxy shape estimated by IDR is also very good and free of holes and other artifacts that COLMAP suffers from.


In this experiment, we process 7 high-resolution images of the digital IRA project, kindly provided by USC’s Institute for Creative Technologies. Neural volumes and NeRF struggle to synthesize high-quality novel views, primarily because of the sparse nature of the input views. IDR works more robustly, even with this sparse camera array, but produces blurry results. Neural lumigraph rendering achieves the best results for this experiment.


Here we process several several high-resolution frames of video sequence showing a human actor. We compare real-time renderings of the meshes exported from IDR and Neural Lumigraph rendering textured by the respective optimized colors. Similar to the other experiments, our approach produces significantly sharper results in real time. This dataset was kindly provided by Volucap GmbH.


Compared to COLMAP, IDR, and NeRF, our Neural Lumigraph Rendering approach better captures the specular highlights of the scene from the DTU dataset while also enabling real-time rendering rates and an implicit surface that can be exported as a mesh.
A comparison of image and shape reconstruction by IDR and our method with the ground-truth Unity rendering and mesh shown on the right. The upper numbers denote the image PSNR averaged over all views and the lower numbers correspond to the Chamfer distance.

Related Projects

You may also be interested in related projects focusing on neural scene representations and rendering:

  • Chan et al. pi-GAN. CVPR 2021 (link)
  • Lindell et al. Automatic Integration for Fast Neural Rendering. CVPR 2021 (link)
  • Sitzmann et al. Implicit Neural Representations with Periodic Activation Functions. NeurIPS 2020 (link)
  • Sitzmann et al. MetaSDF. NeurIPS 2020 (link)
  • Sitzmann et al. Scene Representation Networks. NeurIPS 2019 (link)
  • Sitzmann et al. Deep Voxels. CVPR 2019 (link)