← Back ICRA 2023

MVFusion: Multi-View 3D Object Detection with Semantic-Aligned Radar and Camera Fusion

Zizhang Wu, Guilian Chen, Yuanzhu Gan, Wang, Lei Robin, Jian Pu

PDF

Abstract

Multi-view radar-camera fused 3D object detec- tion provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straight- forward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross- modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic- aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to fuse our radar and image features to strengthen the two modals’ correlation from the global scope via the cross-attention mechanism. Extensive experiments show that MVFusion achieves state-of-the-art performance (51.7% NDS and 45.3% mAP) on the nuScenes dataset. We shall release our code and trained networks upon publication.

Index terms

Deep Learning for Visual Perception Computer Vision for Transportation Computer Vision for Manufacturing