← Back ICRA 2026

CAR-Stereo: Confidence-Aware Adaptive Disparity Refinement for Real-Time Stereo Matching

Chanill Park, Janghyun Kim, Minseong Kweon, and Jinsun Park

PDF

AI summary

Key figure (auto-extracted from paper)

Adaptively filtering disparity refinement candidates based on pixel confidence achieves state-of-the-art real-time accuracy while preserving fine structural details.

Stereo matching Real-time depth estimation Adaptive disparity refinement Confidence-aware masking Autonomous driving perception Residual cost volume

Problem

Contemporary stereo matching networks struggle to balance real-time efficiency with high accuracy for fine-grained structures, often losing spatial precision through downsampling or incurring heavy computational costs from uniform full-range searches.

Approach

The method builds a full-resolution residual cost volume around an initial disparity estimate and applies a confidence-guided adaptive masking strategy to selectively filter redundant refinement candidates per pixel.

Key results

Full-resolution residual cost volume construction preserves fine structural details
Confidence-driven adaptive range masking eliminates redundant disparity candidates
State-of-the-art real-time performance on KITTI 2012 and Scene Flow benchmarks
Superior perception of thin structures and sharp object boundaries compared to efficiency-oriented models

Why it matters

Provides a computationally efficient yet highly accurate depth perception solution critical for safety-critical autonomous driving and robotics systems.

Abstract

In this paper, we propose a novel real-time disparity refinement method that enables precise structure perception. We construct a compact full-resolution cost volume from residuals around the initial disparity and adaptively eliminate redundant information on a per-pixel basis by leveraging the confidence. The core idea of our method comprises residual cost volume construction and an adaptive range masking strategy. The residual cost volume is constructed from refinement candidates around the initial disparity, based on the assumption that the ground-truth disparity is near the initial disparity. Compared to the conventional cost volume constructed over the entire set of disparity candidates, our approach achieves computational efficiency and maintains precise structural information by operating at full-resolution. Moreover, we propose an adaptive range masking strategy that filters refinement candidates for each pixel by leveraging confidence values. This approach effectively eliminates redundant information present in cost volumes that are composed of uniformly sampled refinement candidates. Experimental results on the Scene Flow and KITTI 2012 benchmarks demonstrate that our method achieves real- time performance and sets a new state-of-the-art among real-time stereo matching algorithms.

Index terms

Deep Learning for Visual Perception Computer Vision for Transportation Visual Learning