Every Dataset Counts: Scaling up Monocular 3D Object Detection with Joint Datasets Training
Fulong Ma, Xiaoyang YAN, Guoyang ZHAO, Xiaojie Xu, Yuxuan LIU, Jun Ma, Ming Liu
Abstract
Monocular 3D object detection is essential for autonomous driving. However, current monocular 3D detection algorithms rely on expensive 3D labels from LiDAR scans, making it difficult to use in new datasets and unfamiliar environments. This study explores training a monocular 3D object detection model using a mix of 3D and 2D datasets. The proposed framework includes a robust monocular 3D model that can adapt to different camera settings, a selective-training strategy to handle varying class annotations in datasets, and a pseudo 3D training method using 2D labels to improve detection ability in scenes with only 2D labels (as shown in Fig. 1). By utilizing this framework, we can train models on a combination of 3D and 2D datasets to improve generalization and performance on new datasets with only 2D labels. Extensive experiments on KITTI, nuScenes, ONCE, Cityscapes, and BDD100K datasets showcase the scalability of our proposed approach. Here is our project page: https://sites.google.com/view/fmaafmono3d.