← Back ICRA 2026

Consistency-Driven Confidence Estimation for Stereo Matching

SHUHENG LU, Zaiwang Gu, Xudong Jiang, Jun Cheng

PDF

AI summary

Key figure (auto-extracted from paper)

Injecting training dynamics via cross-epoch consistency alignment yields highly calibrated confidence estimates that eliminate overconfidence in stereo matching's most error-prone regions.

stereo matching confidence estimation epistemic uncertainty evidential deep learning training dynamics consistency regularization

Problem

Existing stereo confidence estimation methods primarily model data noise while ignoring epistemic uncertainty from unstable training dynamics, causing models to be overconfident near occlusions, textureless areas, and reflective surfaces. This miscalibration poses safety risks for depth perception in real-world applications.

Approach

The authors propose tracking pixel-wise prediction stability across training epochs to generate a consistency supervision signal, which is then used to align an evidential uncertainty model's confidence scores with actual training dynamics. This consistency-guided framework is integrated into the MonSter architecture without requiring ground-truth uncertainty labels.

Key results

Epoch-wise consistency accumulation framework injecting training dynamics into stereo confidence estimation
Consistency-ranked evidential discrepancy loss aligning predicted uncertainty with training-derived consistency signals
Full-resolution cross-epoch alignment enabling reliable supervision for dense regression without ground-truth labels
State-of-the-art confidence estimation performance across KITTI 2012, KITTI 2015, and Middlebury benchmarks

Why it matters

Provides calibrated reliability metrics for depth perception, directly benefiting safety-critical applications like autonomous driving and robotic navigation where overconfident errors are most dangerous.

Abstract

Confidence estimation for stereo matching is cru- cial for enhancing the reliability and accuracy of depth perception in real-world applications. Despite effectively cap- turing aleatoric uncertainty through probabilistic modeling and statistical aggregation, current regression-based confidence estimation methods neglect uncertainty arising from unstable training dynamics, resulting in over-confident predictions near occlusion boundaries, textureless regions, and reflective surfaces where errors are most severe. We propose a novel epoch- wise consistency accumulation algorithm that explicitly incorpo- rates training dynamics into confidence estimation. Specifically, we design a full-image cross-epoch alignment mechanism to dynamically quantify pixel-wise training consistency between consecutive epochs, thereby significantly enhancing the estima- tion of confidence. We further propose a consistency-ranked evidential discrepancy loss, which aligns evidential uncertainty estimates with consistency-derived ordinal supervision, aiming to improve the correlation between confidence scores and actual prediction errors. Our approach is incorporated into MonSter, an advanced stereo baseline, achieving SOTA performance in confidence estimation across KITTI 2012, KITTI 2015 and Middlebury benchmarks.

Index terms

Deep Learning for Visual Perception Computational Geometry RGB-D Perception