LDIP: Real-Time On-Road Object Detection with Depth Estimation from a Single Image
Chengpeng Xu, Xiao Sun, Yangyang Xu, Ruolin Wang
Abstract
Detecting on-road objects with absolute depth in- formation is one of the most crucial tasks in autonomous driving to ensure safety. Traditional 2D object detection aims to classify and locate objects in image space, but it cannot acquire in-depth information. While 3D object detection and pixel-level depth detection tasks can provide accurate depth information for objects, they are challenging to deploy in real-world scenarios due to their significant inference overhead. This paper proposes a novel deep learning-based model named the Location and Depth Information Perceptron (LDIP), designed to provide positional, categorical, and absolute depth information for given objects in the images. We first conducted model training and validation on the vehicle-side autonomous driving dataset—KITTI. The experi- mental results show that we achieved a 68.6% mAP in object recognition tasks and an RMSE of 0.101 and AbsRel of 2.327 in depth estimation tasks, all of which represent state-of-the-art performance in comparable tasks. Subsequently, we fine-tuned the trained model on DAIR, where the validated mAP, AbsRel, and RMSE reached 65.4%, 0.092, and 2.461 respectively. This demonstrates the robustness and generalization of our model across different types of road datasets. Moreover, in comparison to other models, our model is more compact while maintaining accuracy, achieving an inference speed of 70 frames per second on an NVIDIA 4060 GPU, thus making it deployable in practical scenarios. Relevant code is available at https://github.com/xcp-ustc/LDIP.