MonoEM: Object-Level Monocular 3D Object Detection Based on Equirectangular Map under Inclement Weather
Jae Hyun Yoon, Yeon Woo Cho, Seok Bong Yoo
AI summary
Problem
Monocular 3D object detectors typically suffer from significant performance degradation under rain, snow, and fog due to weather-induced noise and the difficulty of inferring reliable 3D spatial cues from single images.
Approach
The framework isolates detected 2D object regions, converts them into equirectangular maps for range-aware denoising and adaptive upsampling, and fuses these cleaned range features with restored visual features using coordinate alignment and a 2D-3D alignment loss.
Key results
- Equirectangular object-level reconstruction with range-aware dynamic denoising and adaptive upsampling
- Visual-range fusion detector bridging polar and Cartesian feature spaces
- 2D-3D box alignment loss enforcing spatial consistency
- State-of-the-art AP3D and APBEV scores across clear, rainy, snowy, and foggy conditions
Why it matters
Enables reliable, low-cost autonomous perception in adverse weather without requiring LiDAR or stereo cameras.
Abstract
Monocular 3D object detection has received grow- ing recognition in contemporary research due to its reduced hardware complexity and lower deployment cost compared to multi-sensor-based approaches. Prior research has primarily addressed ideal environmental settings, neglecting the influence of diverse weather scenarios, including rain, snow, and fog, that significantly hinder detection reliability. To enhance robustness under inclement weather conditions, we introduce MonoEM, a monocular 3D object detection framework that leverages object- level image representations and equirectangular maps. Starting from 2D detection results, MonoEM derives equirectangular maps through an equirectangular object-level reconstruction. Furthermore, MonoEM suppresses inclement weather noise in object-level images through image restoration. Subsequently, MonoEM fuses the reconstructed equirectangular map with the restored image and performs 3D bounding box prediction using a visual-range fusion detector. The integration of 2D-3D box alignment loss between 2D and 3D bounding boxes improves the geometric alignment and 3D object detection accuracy. Experimental results across various inclement weather condi- tions validate the notable accuracy and robustness of MonoEM compared to existing monocular 3D baselines. The source code is provided at https://github.com/yeonwoo29/MonoEM.