MonoKey: Keypoint-Based Monocular 3D Object Detection Using Prior Guidance for Occlusion Robustness
Yeon Woo Cho, Jung Woo Cheon, Jae Hyun Yoon, Seok Bong Yoo
AI summary
Problem
Monocular 3D object detection relies heavily on appearance cues that degrade under occlusion, making accurate depth estimation and object localization challenging.
Approach
MonoKey estimates visible 2D keypoints and reconstructs occluded ones using a prior-guided autoencoder, then fuses these cues with frequency-based depth features and refines bounding boxes via a relational graph.
Key results
- State-of-the-art KITTI car detection under moderate and hard occlusion
- Superior performance in adverse weather (snow, rain, fog) on CADC and Dense datasets
- Robust reconstruction of occluded object geometry using symmetry and yaw priors
- Mitigation of spectral bias in depth estimation via frequency-domain processing
Why it matters
Provides a cost-effective, robust perception solution for autonomous vehicles and robotics operating in complex, occluded environments.
Abstract
Monocular 3D object detection has garnered at- tention due to its cost-efficiency and simpler setup compared with multisensor systems. In this task, an accurate depth estimation is crucial for precise object localization, however extracting sufficient depth cues from a single image remains challenging. Moreover, when occlusions occur, structural cues become limited, making precise object localization increasingly difficult. To address these problems, we propose MonoKey, a keypoint-based monocular 3D object detection method that is robust to occlusion. MonoKey applies 2D keypoints due to their suitability for recovering occluded regions. The occlusion-robust 2D keypoint detection approach estimates object keypoints and reconstructs occluded ones using prior information. The frequency-based global-local depth predictor estimates 3D cues using fast Fourier convolution to incorporate global and local contexts. These 3D cues and keypoints are fused in a 3D detection decoder. Relational graph refinement adjusts the initial bounding boxes for improved localization. The experi- mental results indicate that MonoKey outperforms the existing monocular 3D object detection methods. The source code is available at https://github.com/yeonwoo29/MonoKey.git.