We propose to represent the scene with a set of local neural radiance fields, named nerflets, which are trained with only 2D supervision. Our representation is not only useful for 2D tasks such as novel view synthesis and panoptic segmentation, but also capable of solving 3D-oriented tasks such as 3D segmentation and scene editing. The key idea is our learned structured decomposition (top right).
We address efficient and structure-aware 3D scene representation from images. Nerflets are our key contribution -- a set of local neural radiance fields that together represent a scene. Each nerflet maintains its own spatial position, orientation, and extent, within which it contributes to panoptic, density, and radiance reconstructions. By leveraging only photometric and inferred panoptic image supervision, we can directly and jointly optimize the parameters of a set of nerflets so as to form a decomposed representation of the scene, where each object instance is represented by a group of nerflets. During experiments with indoor and outdoor environments, we find that nerflets: (1) fit and approximate the scene more efficiently than traditional global NeRFs, (2) allow the extraction of panoptic and photometric renderings from arbitrary views, and (3) enable tasks rare for NeRFs, such as 3D panoptic segmentation and interactive editing.
Compared to vanilla NeRF, nerflets have good efficiency due to their local structure. Each nerflet is much smaller than a vanilla NeRF, and only the nerflets near a point sample need to be evaluated. As a result, we are able to build an interactive editing tool that edits and renders nerflets in real time.
@article{zhang2022nerfusion,
author = {Zhang, Xiaoshuai and Kundu, Abhijit and Funkhouser, Thomas and Guibas, Leonidas and Su, Hao and Genova, Kyle},
title = {Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision},
journal = {CVPR},
year = {2023},
}