← Back ICRA 2026

PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization

Siyan Dong, Zijun Wang, Lulu Cai, Yi Ma, Yanchao Yang

PDF

AI summary

Key figure (auto-extracted from paper)

Combining a metric-aware camera pose regression network with randomized optimization enables real-time, robust, and accurate dense 3D reconstruction even under highly unstable camera motions.

Dense reconstruction RGB-D SLAM Camera pose regression Randomized optimization Real-time tracking Robotics

Problem

Current RGB-D SLAM systems fail during large viewpoint changes, fast motions, or sudden shaking because classical optimization methods require smooth initialization and learning-based approaches lack metric accuracy.

Approach

The method uses a neural network to predict metric-aware relative poses from consecutive RGB-D frames, providing a robust initialization that is then refined by a randomized optimization algorithm aligning depth data to a TSDF scene representation.

Key results

Achieves state-of-the-art tracking accuracy on fast-motion and camera-shake benchmarks
Maintains comparable reconstruction accuracy on stable motion sequences
Operates in real-time without requiring bundle adjustment or loop closure
Successfully reconstructs scenes under large viewpoint changes and rapid in-place rotations

Why it matters

Enables reliable real-time 3D mapping for autonomous robots operating in uncontrolled, dynamic environments where camera tracking typically fails.

Abstract

Real-time dense scene reconstruction during un- stable camera motions is crucial for robotics, yet current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking. Classical optimization-based methods deliver high accuracy but fail with poor initialization during large motions, while learning-based approaches provide robustness but lack sufficient accuracy for dense reconstruction. We address this challenge through a combination of learning-based initialization with optimization- based refinement. Our method employs a camera pose regres- sion network to predict metric-aware relative poses from con- secutive RGB-D frames, which serve as reliable starting points for a randomized optimization algorithm that further aligns depth images with the scene geometry. Extensive experiments demonstrate promising results: our approach outperforms the best competitor on challenging benchmarks, while maintaining comparable accuracy on stable motion sequences. The system operates in real-time, showcasing that combining simple and principled techniques can achieve both robustness for unstable motions and accuracy for dense reconstruction. Code released: https://github.com/siyandong/PROFusion.

Index terms

SLAM Localization Mapping