Locally Conditioned 3D Diffusion | 3DV 2024

Ryan Po, Gordon Wetzstein

Controllable text-to-3D scene generation with intuitive user inputs.

ABSTRACT

Designing complex 3D scenes has been a tedious, manual process requiring domain expertise. Emerging text-to-3D generative models show great promise for making this task more intuitive, but existing approaches are limited to object-level generation. We introduce locally conditioned diffusion as an approach to compositional scene diffusion, providing control over semantic parts using text prompts and bounding boxes while ensuring seamless transitions between these parts. We demonstrate a score distillation sampling–based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.

FILES

 

CITATION

R. Po, G. Wetzstein, Compositional 3D Scene Generation using Locally Conditioned Diffusion, 3DV 2024

@inproceedings{Po:2023:comp3d,
author = {Ryan Po and Gordon Wetzstein},
title = {Compositional 3D Scene Generation using Locally Conditioned Diffusion},
booktitle = {3DV},
year = {2024}
}

Method

Overview of our method. We generate text-to-3D content using a score distillation sampling–based pipeline. A latent diffusion prior is used to optimize a Voxel NeRF representation of the 3D scene. The latent diffusion prior is conditioned on a bounding box rendering of the scene, where a noise estimation on the image is formed for every input text prompt, and denoising steps are applied based on the segmentation mask provided by the bounding box rendering.

Results



An overview of results generated with our method.

Acknowledgements

We thank Alex Bergman and Cindy Nguyen for valuable discussions and feedback on drafts. This project was in part supported by the Samsung GRO program. Ryan Po was supported by a Stanford Graduate Fellowship.

Related Projects

You may also be interested in related projects on 3D GANs, such as :

  • Chan et al., GeNVS, 2023 (link)
  • Deng et al. LumiGAN, 2023 (link)
  • Bergman et al. GNARF, NeurIPS 2022 (link)
  • Chan et al. EG3D, CVPR 2022 (link)
  • Chan et al. pi-GAN. CVPR 2021 (link)