3D GAN Inversion for Controllable Portrait Image Animation | ECCVW 2022

Connor Z. Lin*, David B. Lindell*, Eric R. Chan, Gordon Wetzstein

Portrait image animation and attribute editing using 3D GAN inversion.

ABSTRACT

Millions of images of human faces are captured every single day; but these photographs portray the likeness of an individual with a fixed pose, expres- sion, and appearance. Portrait image animation enables the post-capture adjustment of these attributes from a single image while maintaining a pho- torealistic reconstruction of the subject’s likeness or identity. Still, current methods for portrait image animation are typically based on 2D warping operations or manipulations of a 2D generative adversarial network (GAN) and lack explicit mechanisms to enforce multi-view consistency. Thus these methods may significantly alter the identity of the subject, especially when the viewpoint relative to the camera is changed. In this work, we leverage newly developed 3D GANs, which allow explicit control over the pose of the image subject with multi-view consistency. We propose a supervision strat- egy to flexibly manipulate expressions with 3D morphable models, and we show that the proposed method also supports editing appearance attributes, such as age or hairstyle, by interpolating within the latent space of the GAN. The proposed technique for portrait image animation outperforms previous methods in terms of image quality, identity preservation, and pose transfer while also supporting attribute editing.

FILES

Technical paper (link to arxiv)
Code (coming soon)

CITATION

C.Z. Lin*, D.B. Lindell*, E.R. Chan, G. Wetzstein, 3D GAN Inversion for Controllable Portrait Image Animation, ECCV Workshop on Learning to Generate 3D Shapes and Scenes, 2022.

@inproceedings{lin20223dganinversion,
author = {C.Z. Lin and D.B. Lindell and E.R. Chan and G. Wetzstein},
title = {3D GAN Inversion for Controllable Portrait Image Animation},
booktitle = {ECCV Workshop on Learning to Generate 3D Shapes and Scenes},
year = {2022},
}

PiPELINE ARCHITECTURE

Given a source image and target image sequence, our method transfers pose and expression attributes to the source image. (1) We encode the expressions of the target image frames and transfer them to the source image using a 3DMM predicted with a pre-trained model. Using the resulting 3DMM, we render expression templates, or images of the source image face with the target expressions. (2) GAN inversion is used to re-render the expression templates, which also in-paints the mouth region if necessary. The in-painted mouth is composited back onto the expression template and image background, and the result is embedded into the GAN latent space using Pivotal Tuning Inversion. (3) The final result is rendered by explicitly conditioning the 3D GAN with the poses of the target sequence.

PortrAIT Animation and Editing

Given a source portrait image and a target expression (e.g., specified with a target image), our method transfers the expression and pose to the input source image. We achieve multi-view consistent edits of pose by embedding the expression-edited portrait image into the latent space of a 3D GAN (see predicted underlying shape, right). By interpolating within the latent space of the GAN, we can also apply our method to animate attribute-edited images, allowing adjustments to age, hair, gender or appearance in addition to expression and pose.

Additional Examples

Acknowledgements

This project was supported in part by a PECASE by the ARO, NSF award 1839974, a Samsung GRO, and Stanford HAI.

Related Projects

You may also be interested in related projects on 2D and 3D GANs, such as :

Karras et al. StyleGAN, 2019 (link)
Karras et al. StyleGAN2, 2020 (link)
Karras et al. StyleGAN3, 2021 (link)
Chan et al. pi-GAN. CVPR 2021 (link)
Chan et al. EG3D. CVPR 2022 (link)

or related projects focusing on neural scene representations and rendering from our group:

Lindell et al. BACON. 2021 (link)
Martel et al. ACORN. SIGGRAPH 2021 (link)
Kellnhofer et al. Neural Lumigraph Rendering. CVPR 2021 (link)
Lindell et al. AutoInt. CVPR 2021 (link)
Sitzmann et al. Implicit Neural Representations with Periodic Activation Functions. NeurIPS 2020 (link)
Sitzmann et al. MetaSDF. NeurIPS 2020 (link)
Sitzmann et al. Scene Representation Networks. NeurIPS 2019 (link)
Sitzmann et al. Deep Voxels. CVPR 2019 (link)