← Back ICRA 2026

MonoEM: Object-Level Monocular 3D Object Detection Based on Equirectangular Map under Inclement Weather

Jae Hyun Yoon, Yeon Woo Cho, Seok Bong Yoo

PDF

AI summary

Key figure (auto-extracted from paper)

MonoEM achieves state-of-the-art 3D detection accuracy and robustness under inclement weather by leveraging object-level equirectangular maps and a novel visual-range fusion detector.

Monocular 3D detection inclement weather equirectangular map object-level reconstruction visual-range fusion 2D-3D alignment

Problem

Monocular 3D object detectors typically suffer from significant performance degradation under rain, snow, and fog due to weather-induced noise and the difficulty of inferring reliable 3D spatial cues from single images.

Approach

The framework isolates detected 2D object regions, converts them into equirectangular maps for range-aware denoising and adaptive upsampling, and fuses these cleaned range features with restored visual features using coordinate alignment and a 2D-3D alignment loss.

Key results

Equirectangular object-level reconstruction with range-aware dynamic denoising and adaptive upsampling
Visual-range fusion detector bridging polar and Cartesian feature spaces
2D-3D box alignment loss enforcing spatial consistency
State-of-the-art AP3D and APBEV scores across clear, rainy, snowy, and foggy conditions

Why it matters

Enables reliable, low-cost autonomous perception in adverse weather without requiring LiDAR or stereo cameras.

Abstract

Monocular 3D object detection has received grow- ing recognition in contemporary research due to its reduced hardware complexity and lower deployment cost compared to multi-sensor-based approaches. Prior research has primarily addressed ideal environmental settings, neglecting the influence of diverse weather scenarios, including rain, snow, and fog, that significantly hinder detection reliability. To enhance robustness under inclement weather conditions, we introduce MonoEM, a monocular 3D object detection framework that leverages object- level image representations and equirectangular maps. Starting from 2D detection results, MonoEM derives equirectangular maps through an equirectangular object-level reconstruction. Furthermore, MonoEM suppresses inclement weather noise in object-level images through image restoration. Subsequently, MonoEM fuses the reconstructed equirectangular map with the restored image and performs 3D bounding box prediction using a visual-range fusion detector. The integration of 2D-3D box alignment loss between 2D and 3D bounding boxes improves the geometric alignment and 3D object detection accuracy. Experimental results across various inclement weather condi- tions validate the notable accuracy and robustness of MonoEM compared to existing monocular 3D baselines. The source code is provided at https://github.com/yeonwoo29/MonoEM.

Index terms

Object Detection Segmentation and Categorization Computer Vision for Automation AI-Based Methods