CAR-Stereo: Confidence-Aware Adaptive Disparity Refinement for Real-Time Stereo Matching
Chanill Park, Janghyun Kim, Minseong Kweon, and Jinsun Park
AI summary
Problem
Contemporary stereo matching networks struggle to balance real-time efficiency with high accuracy for fine-grained structures, often losing spatial precision through downsampling or incurring heavy computational costs from uniform full-range searches.
Approach
The method builds a full-resolution residual cost volume around an initial disparity estimate and applies a confidence-guided adaptive masking strategy to selectively filter redundant refinement candidates per pixel.
Key results
- Full-resolution residual cost volume construction preserves fine structural details
- Confidence-driven adaptive range masking eliminates redundant disparity candidates
- State-of-the-art real-time performance on KITTI 2012 and Scene Flow benchmarks
- Superior perception of thin structures and sharp object boundaries compared to efficiency-oriented models
Why it matters
Provides a computationally efficient yet highly accurate depth perception solution critical for safety-critical autonomous driving and robotics systems.
Abstract
In this paper, we propose a novel real-time disparity refinement method that enables precise structure perception. We construct a compact full-resolution cost volume from residuals around the initial disparity and adaptively eliminate redundant information on a per-pixel basis by leveraging the confidence. The core idea of our method comprises residual cost volume construction and an adaptive range masking strategy. The residual cost volume is constructed from refinement candidates around the initial disparity, based on the assumption that the ground-truth disparity is near the initial disparity. Compared to the conventional cost volume constructed over the entire set of disparity candidates, our approach achieves computational efficiency and maintains precise structural information by operating at full-resolution. Moreover, we propose an adaptive range masking strategy that filters refinement candidates for each pixel by leveraging confidence values. This approach effectively eliminates redundant information present in cost volumes that are composed of uniformly sampled refinement candidates. Experimental results on the Scene Flow and KITTI 2012 benchmarks demonstrate that our method achieves real- time performance and sets a new state-of-the-art among real-time stereo matching algorithms.