
conditioning the generation process on the category labels.
SSOD [27] designs a GAN-based generator to synthesize
images with 2D bounding boxes annotation for the object
detection task. In this paper, we explore the use of 3D GANs
to synthesize datasets with 3D-related annotations, which is
valuable but rarely explored.
Neural radiance field (NeRF) [26] based 3D GANs [6,28],
which display photorealistic synthesis and 3D controllable
property, is a natural choice to synthesize 3D-related train-
ing data. However, our experimental results show that they
struggle to keep high-resolution outputs and geometry-
consistency results by relying on a 2D upsampler. Further-
more, the generated images are not well aligned with the
given 3D pose, due to the lack of explicit 3D consistency reg-
ularization. This misalignment would introduce large label
noise in the dataset, limiting the performance in downstream
tasks. In addition to these findings, the camera parameters
are fixed after training, making them challenging to align the
output resolution with arbitrary downstream data.
In this paper, we propose Lift3D, a new paradigm for syn-
thesizing 3D training data by lifting pretrained 2D GAN
to 3D generative radiance field. Compared with the 3D
GANs that rely on a 2D upsampler, we invert the gener-
ation pipeline into 2D-to-3D rather than 3D-to-2D to achieve
higher-resolution synthesis. As depicted in Fig. 1, we first
take advantage of a well-disentangled 2D GAN to generate
multi-view images with corresponding pseudo pose annota-
tion. The multi-view images are then lifted to 3D represen-
tation with NeRF reconstruction. In particular, by distilling
from pretrained 2D GAN, lift3D achieves high-quality syn-
thesis that is comparable to SOTA 2D generative models. By
decoupling the 3D generation from generative image synthe-
sis, Lift3D can generate images that are tightly aligned with
the sampling label. Finally, getting rid of 2D upsamplers,
Lift3D can synthesize images in any resolution by accumu-
lating single-ray evaluation. With these properties, we can
leverage the generated objects to augment existing dataset
with enhanced quantity and diversity.
To validate the effectiveness of our data generation frame-
work, we conduct experiments on image-based 3D object
detection tasks with KITTI [16] and nuScenes [4] datasets.
Our framework outperforms the best prior data augmen-
tation method [24] with significantly better 3D detection
accuracy. Furthermore, even without any labeled data, it
achieves promising results in an unsupervised manner. Our
contributions are summarized as follows:
•
We provide the first exploration of using 3D GAN to
synthesize 3D training data, which opens up a new
possibility that adapts NeRF’s powerful capabilities of
novel view synthesis to benefit downstream tasks in 3D.
•
To synthesize datasets with high-resolution images and
accurate 3D labels, we propose Lift3D, an inverted 2D-
GIRAFFE HD Ours
Position Rotation
Position Rotation
Figure 2. We compare our generation result with GIRAFFE
HD [45]. We zoom in or rotate the sampled 3D box to control
the generation of models. The rotation of the 3D box introduces
artifacts to images generated by GIRAFFE HD. All images are
plotted with sampled 3D bounding boxes.
to-3D data generation framework that disentangles 3D
generation from generative image synthesis.
•
Our experimental results demonstrate that the synthe-
sized training data can improve image-based 3D detec-
tors across different settings and datasets.
2. Related Work
Data Generation for Downstream Tasks
Benefiting
from low data acquisition costs, learning from synthesized
data is an attractive way to scale up the training data. Several
studies like [1, 3, 14, 31] leverage graphic engines to synthe-
size training data without human annotation. However, they
rely on pre-built 3D assets to mimic the world, which is also
a non-negligible effort in the whole pipeline.
Without any burden of collecting 3D assets, generative
models can also be considered as a neural rendering alterna-
tive to graphics engines. For example, BigDatasetGAN [23]
generates classification datasets by conditioning the genera-
tion process on the class labels. SSOD [27] samples dataset
with 2D bounding boxes via generative image synthesis. Our
method goes further, utilizing 3D GAN to generate training
data with 3D annotation, greatly reducing labeling effort in
3D data.
3D-aware Generative Image Synthesis
Recently, Gen-
erative Adversarial Networks (GANs) [18] have made great
progress in generating high-resolution photorealistic 2D im-
ages. One natural extension of the 2D GANs is to endow
their 3D controllable ability as 2D images are projections
of the 3D world. To provide the 3D-aware ability, recent
2
评论