← Back ICRA 2026

MRASfM: Multi-Camera Reconstruction and Aggregation through Structure-From-Motion in Driving Scenes

Lingfeng Xuan, Chang Nie, Yiqing Xu, Yanzi Miao, Hesheng Wang

PDF

AI summary

Key figure (auto-extracted from paper)

MRASfM enables robust, efficient, and scalable multi-camera SfM for driving scenes by leveraging rigid camera-set constraints and semantic-aided road filtering, achieving state-of-the-art accuracy on nuScenes.

Multi-camera SfM driving scene reconstruction bundle adjustment road surface filtering multi-scene aggregation autonomous mapping

Problem

Applying traditional Structure-from-Motion to multi-camera driving data suffers from unreliable pose estimation due to occlusions and dynamic objects, excessive road surface outliers, and low computational efficiency from optimizing too many individual camera poses.

Approach

The framework treats the multi-camera array as a single rigid unit during registration and bundle adjustment, uses planar fitting to filter road outliers, and aggregates fragmented scenes coarsely-to-finely using GNSS and visual overlap.

Key results

Robust pose estimation under partial occlusion via rigid camera-set constraints
Significantly reduced optimization variables in bundle adjustment for faster reconstruction
Coarse-to-fine multi-scene aggregation without requiring shared images between segments
State-of-the-art accuracy on nuScenes with 0.124 absolute pose error

Why it matters

Enables high-precision, large-scale 3D mapping and HD map construction for autonomous driving by making multi-camera SfM practical and efficient in real-world conditions.

Abstract

Structure from Motion (SfM) estimates camera poses and reconstructs point clouds, forming a foundation for various tasks. However, applying SfM to driving scenes cap- tured by multi-camera systems presents significant difficulties, including unreliable pose estimation, excessive outliers in road surface reconstruction, and low reconstruction efficiency. To address these limitations, we propose a Multi-camera Recon- struction and Aggregation Structure-from-Motion (MRASfM) framework specifically designed for driving scenes. MRASfM enhances the reliability of camera pose estimation by leveraging the fixed spatial relationships within the multi-camera system during the registration process. To improve the quality of road surface reconstruction, our framework employs a plane model to effectively remove erroneous points from the triangulated road surface. Moreover, treating the multi-camera set as a single unit in Bundle Adjustment (BA) helps reduce optimization variables to boost efficiency. In addition, MRASfM achieves multi-scene aggregation through scene association and assembly modules in a coarse-to-fine fashion. We deployed multi-camera systems on actual vehicles to validate the generalizability of MRASfM across various scenes and its robustness in challeng- ing conditions through real-world applications. Furthermore, large-scale validation results on public datasets show the state- of-the-art performance of MRASfM, achieving 0.124 absolute pose error on the nuScenes dataset. The code is available at https://github.com/IRMVLab/MRASfM.

Index terms

Localization Mapping Autonomous Vehicle Navigation