BACON: Band-limited Coordinate Networks | CVPR 2022

David B. Lindell, Dave Van Veen, Jeong Joon Park, Gordon Wetzstein

A new type of neural network with an analytical Fourier spectrum that enables multiscale scene representation.

ABSTRACT

Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these networks are typically trained to represent a signal at a single scale, and so naive downsampling or upsampling results in artifacts. We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. BACON has predictable behavior at unsupervised points, can be designed based on the spectral characteristics of the represented signal, and can represent signals at multiple scales without explicit supervision. We demonstrate BACON for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks in terms of interpretability and quality.

FILES

CITATION

D. B. Lindell, D. Van Veen, J. J. Park, G. Wetzstein, BACON: Band-limited Coordinate Networks for Multiscale Scene Representation, CVPR 2022.

@inproceedings{lindell2021bacon,
author = {Lindell, David B. and Van Veen, Dave and Park, Jeong Joon and Wetzstein, Gordon},
title = {BACON: Band-limited coordinate networks for multiscale scene representation},
booktitle = {CVPR},
year={2022}
}

Image Representation

We compare BACON to Fourier Features, SIREN, and the integrated positional encoding of Mip-NeRF for fitting an image at 256×256 resolution. Fourier Features and SIREN show aliasing when downsampled. Mip-NeRF is explicitly trained at multiple scales and learns anti-aliasing. All methods except BACON show artifacts when upsampling the network at 4x resolution. BACON is supervised at a single scale and learns band-limited outputs that closely matche a low-pass filtered reference (see left column, Fourier spectra insets).

Neural Radiance Fields

Comparison between NeRF, Mip-NeRF, and BACON. BACON outperforms NeRF for multiscale representation while using fewer parameters than Mip-NeRF to represent low resolution outputs.

3D Shape Representation

Shape representation via fitting the signed distance function. BACON represents 3D shapes at multiple scales while performing comparably to single-scale representations. Controlling the bandwidth of the network Fourier spectrum enables interpretable multiscale outputs. We also provide a method for multiscale marching cubes that speeds up mesh extraction from BACON by roughly 80x compared to conventional networks.

Extrapolation Behavior

Since BACON uses discrete frequencies, the representation is periodic. We fit a seamless texture (supervised in the red square) and show the periodic extrapolation behavior by querying the network outside the trained domain.

1D Fitting Example

Other representations (SIREN, Fourier Features) are not band limited and have spurious high frequency components when fitting a simple 1D signal (orange) at a sparse set of supervised points (pink). BACON correctly interpolates between the supervised points (bottom middle) and we can also apply a low-pass filter (bottom row) to fit low-frequency components.

Architecture

BACON Architecture and forward pass. The architecture builds on recently introduced Multiplicative Filter Networks which interleave linear layers and Hadamard products with sine non-linearities. Intermediate network outputs produce results at multiple scales.

We initialize the frequencies of each sine layer in the network using random uniform initialization. Then, we can derive the maximum representable bandwidth of each network output layer.

Since each output layer has a maximum bandwidth, BACON can be supervised on a high-resolution signal and learns multiscale decompositions automatically.

Initialization Scheme

The conventional initialization scheme for Multiplicative Filter Networks results in vanishingly small activations for deep networks (top rows). Our proposed initialization scheme maintains a standard normal distribution after each linear layer in the network (bottom rows).

ACKNOWLEDGMENTS

This project was supported in part by a PECASE by the ARO and NSF award 1839974.

Related Projects

You may also be interested in related projects focusing on neural scene representations and rendering:

Martel et al. ACORN. SIGGRAPH 2021 (link)
Chan et al. pi-GAN. CVPR 2021 (link)
Kellnhofer et al. Neural Lumigraph Rendering. CVPR 2021 (link)
Lindell et al. AutoInt. CVPR 2021 (link)
Sitzmann et al. Implicit Neural Representations with Periodic Activation Functions. NeurIPS 2020 (link)
Sitzmann et al. MetaSDF. NeurIPS 2020 (link)
Sitzmann et al. Scene Representation Networks. NeurIPS 2019 (link)
Sitzmann et al. Deep Voxels. CVPR 2019 (link)