← Back ICRA 2026

OASIS-DC: Generalizable Depth Completion Via Output-Level Alignment of Sparse-Integrated Monocular Pseudo Depth

Jaehyeon Cho, Jhonghyun An

PDF

AI summary

Key figure (auto-extracted from paper)

Coupling a frozen monocular depth prior with sparse LiDAR anchors via lightweight residual refinement enables accurate, metric depth completion under severe data scarcity.

Depth completion few-shot learning monocular depth estimation sparse LiDAR fusion foundation models metric depth

Problem

Monocular foundation models output relative depth, while traditional completion methods require large labeled datasets and extensive validation curation, hindering deployment in dynamic environments.

Approach

The method aligns a frozen monocular depth estimator’s output with sparse LiDAR measurements using a non-learned Poisson formulation to create a calibrated pseudo-depth prior, which a lightweight network then corrects via localized residual refinement.

Key results

Non-learned Poisson fusion aligns frozen monocular depth outputs with sparse LiDAR anchors
Lightweight residual network corrects local errors while preserving global metric scale
Achieves top-tier few-shot accuracy on KITTI-DC and NYUv2 benchmarks
Sustains stable depth and sharp edges under strict, deployment-oriented data scarcity

Why it matters

It offers a computationally efficient, deployment-ready solution for robotics and autonomous driving systems operating under real-world label scarcity.

Abstract

Recent monocular foundation models excel at zero-shot depth estimation, yet their outputs are inherently relative rather than metric, limiting direct use in robotics and autonomous driving. We leverage the fact that relative depth preserves global layout and boundaries: by calibrating it with sparse range measurements, we transform it into a pseudo met- ric depth prior. Building on this prior, we design a refinement network that follows the prior where reliable and deviates where necessary, enabling accurate metric predictions from very few labeled samples. The resulting system is particularly effective when curated validation data are unavailable, sustaining stable scale and sharp edges across few-shot regimes. These findings suggest that coupling foundation priors with sparse anchors is a practical route to robust, deployment-ready depth completion under real-world label scarcity.

Index terms

Deep Learning for Visual Perception Sensor Fusion Deep Learning Methods