The More the Better? Confidence-Driven Residual Weighting and Depth Fusion for Multi-RGB-D Inertial Odometry
Seungsang Yun, Jaeho Shin, Jaekwang Cha, Ayoung Kim
AI summary
Problem
Simply adding more cameras to visual odometry systems increases the field of view but introduces degraded or misaligned views that cause outliers and computational bottlenecks, undermining real-time accuracy.
Approach
The framework adaptively down-weights unreliable camera residuals based on photometric quality and motion alignment, while an early motion-guided selection step filters out non-informative pixels before optimization.
Key results
- Confidence-driven weighting dynamically scales camera residuals based on photometric reliability and motion alignment
- Motion-guided point selection prunes non-informative pixels early, mathematically validated for efficiency
- Achieves real-time 27 Hz tracking with four cameras across saturation, occlusion, low-light, and glare conditions
- Releases the first publicly available multi-RGB-D inertial odometry dataset
Why it matters
Provides a scalable, robust navigation solution for autonomous robots operating with multi-camera rigs in challenging real-world environments.
Abstract
Multi-camera systems hold considerable promise for enhancing visual odometry by expanding the field of view, yet simply adding more cameras does not guarantee higher accuracy. Because increasing the number of cameras also raises the likelihood of degraded or misaligned views, appropriate handling is essential to prevent severe outliers and corrupted global pose estimates. Previous methods discard points in back- end optimization based on residuals, which has been a bottleneck for real-time performance since erroneous measurements are inevitably incorporated into the main pipeline before removal. In response, we propose a direct Multi-RGB-D Inertial Odometry framework driven by confidence-based weighting, which adap- tively down-weights unreliable cameras based on photometric quality and viewpoint alignment. To manage the heavy data load typical of multi-camera setups, we also incorporate a motion-guided selection strategy, filtering out non-informative points before costly alignment. This early pruning reduces computation yet retains critical constraints for odometry. By combining these techniques, our system achieves robust, scale- consistent pose estimation in real time, even with four cameras, as validated through challenging indoor-outdoor experiments involving saturation, occlusions, low-light conditions, and severe glare. We publicly release our multi-RGB-D-inertial dataset at https://github.com/seungsang07/multi-rgbd-inertial-dataset.