← Back ICRA 2026

ModalPatch: A Plug-And-Play Module for Robust Multi-Modal 3D Object Detection under Modality Drop

Shuangzhi Li, Lei Ma, Xingyu Li

PDF

AI summary

Key figure (auto-extracted from paper)

ModalPatch is a lightweight, plug-and-play module that uses historical feature memory and uncertainty-aware fusion to maintain robust 3D object detection performance even during simultaneous LiDAR and camera drops.

Multi-modal 3D detection modality drop plug-and-play module temporal feature prediction uncertainty-guided fusion autonomous driving

Problem

Multi-modal 3D detectors degrade severely when sensor inputs are temporarily missing, especially during simultaneous modality drops. Existing solutions either assume at least one modality remains available or require costly architectural redesigns and full model retraining.

Approach

ModalPatch plugs into existing detectors without retraining by maintaining a short-term history memory to predict missing features and using an uncertainty-guided cross-modality fusion strategy to dynamically suppress unreliable signals and reinforce trustworthy ones.

Key results

First plug-and-play solution for arbitrary modality drop without retraining
History-based temporal transformer predicts missing features using past frames
Uncertainty-guided fusion suppresses biased signals and reinforces reliable cross-modal features
Consistently boosts mAP and NDS across four SOTA detectors under 10%, 30%, and 50% drop rates

Why it matters

Enables autonomous driving and robotics systems to maintain reliable perception during transient sensor failures without costly retraining or architectural changes.

Abstract

Multi-modal 3D object detection is pivotal for autonomous driving, integrating complementary sensors like LiDAR and cameras. However, its real-world reliability is challenged by transient data interruptions and missing, where modalities can momentarily drop due to hardware glitches, adverse weather, or occlusions. This poses a critical risk, especially during a simultaneous modality drop, where the vehicle is momentarily blind. To address this problem, we introduce ModalPatch, the first plug-and-play module designed to enable robust detection under arbitrary modality-drop sce- narios. Without requiring architectural changes or retraining, ModalPatch can be seamlessly integrated into diverse detection frameworks. Technically, ModalPatch leverages the temporal nature of sensor data for perceptual continuity, using a history- based module to predict and compensate for transiently unavail- able features. To improve the fidelity of the predicted features, we further introduce an uncertainty-guided cross-modality fusion strategy that dynamically estimates the reliability of com- pensated features, suppressing biased signals while reinforcing informative ones. Extensive experiments show that ModalPatch consistently enhances both robustness and accuracy of state- of-the-art 3D object detectors under diverse modality-drop conditions. Code will be available at https://github.com/Castiel- Lee/MM3Det MD.

Index terms

Object Detection Segmentation and Categorization Sensor Fusion Computer Vision for Automation