PROFusion: Robust and Accurate Dense Reconstruction via Camera Pose Regression and Optimization
Siyan Dong, Zijun Wang, Lulu Cai, Yi Ma, Yanchao Yang
AI summary
Problem
Current RGB-D SLAM systems fail during large viewpoint changes, fast motions, or sudden shaking because classical optimization methods require smooth initialization and learning-based approaches lack metric accuracy.
Approach
The method uses a neural network to predict metric-aware relative poses from consecutive RGB-D frames, providing a robust initialization that is then refined by a randomized optimization algorithm aligning depth data to a TSDF scene representation.
Key results
- Achieves state-of-the-art tracking accuracy on fast-motion and camera-shake benchmarks
- Maintains comparable reconstruction accuracy on stable motion sequences
- Operates in real-time without requiring bundle adjustment or loop closure
- Successfully reconstructs scenes under large viewpoint changes and rapid in-place rotations
Why it matters
Enables reliable real-time 3D mapping for autonomous robots operating in uncontrolled, dynamic environments where camera tracking typically fails.
Abstract
Real-time dense scene reconstruction during un- stable camera motions is crucial for robotics, yet current RGB-D SLAM systems fail when cameras experience large viewpoint changes, fast motions, or sudden shaking. Classical optimization-based methods deliver high accuracy but fail with poor initialization during large motions, while learning-based approaches provide robustness but lack sufficient accuracy for dense reconstruction. We address this challenge through a combination of learning-based initialization with optimization- based refinement. Our method employs a camera pose regres- sion network to predict metric-aware relative poses from con- secutive RGB-D frames, which serve as reliable starting points for a randomized optimization algorithm that further aligns depth images with the scene geometry. Extensive experiments demonstrate promising results: our approach outperforms the best competitor on challenging benchmarks, while maintaining comparable accuracy on stable motion sequences. The system operates in real-time, showcasing that combining simple and principled techniques can achieve both robustness for unstable motions and accuracy for dense reconstruction. Code released: https://github.com/siyandong/PROFusion.