← Back ICRA 2026

EndoDDC: Learning Sparse to Dense Reconstruction for Endoscopic Robotic Navigation Via Diffusion Depth Completion

Yinheng LIN, Yiming Huang, Beilei Cui, Long Bai, Huxin Gao, Hongliang Ren, Jiewen Lai

PDF

AI summary

Key figure (auto-extracted from paper)

EndoDDC leverages a gradient-conditioned diffusion model to accurately complete sparse depth maps in endoscopic environments, outperforming state-of-the-art methods in accuracy and robustness.

Endoscopic depth completion Diffusion models Sparse-to-dense reconstruction Surgical robotics Depth gradient fusion 3D navigation

Problem

Accurate depth estimation for endoscopic surgical robots is hindered by weak textures, specular reflections, and the lack of dense ground-truth data, which limits fine-tuning and self-supervised learning approaches.

Approach

The method fuses RGB images and sparse depth inputs through a multi-scale feature extractor and iterative depth gradient fusion module, then refines the coarse depth map using a conditional diffusion model to generate dense, geometrically coherent outputs.

Key results

Proposes EndoDDC pipeline for sparse-to-dense endoscopic depth completion
Introduces ConvGRU-based depth gradient fusion for iterative geometric guidance
Develops gradient-conditioned diffusion model to resolve local depth ambiguities
Achieves state-of-the-art accuracy and robustness on C3VD and StereoMIS datasets

Why it matters

It provides a reliable, annotation-free depth perception solution that enhances spatial awareness and safe instrument guidance for endoscopic surgical robots.

Abstract

Accurate depth estimation plays a critical role in the navigation of endoscopic surgical robots, forming the foundation for 3D reconstruction and safe instrument guidance. Fine-tuning pretrained models heavily relies on endoscopic surgical datasets with precise depth annotations. While existing self-supervised depth estimation techniques eliminate the need for accurate depth annotations, their performance degrades in environments with weak textures and variable lighting, leading to sparse reconstruction with invalid depth estimation. Depth completion using sparse depth maps can mitigate these issues and improve accuracy. Despite the advances in depth comple- tion techniques in general fields, their application in endoscopy remains limited. To overcome these limitations, we propose En- doDDC, an endoscopy depth completion method that integrates images, sparse depth information with depth gradient features, and optimizes depth maps through a diffusion model, address- ing the issues of weak texture and light reflection in endoscopic environments. Extensive experiments on two publicly available endoscopy datasets show that our approach outperforms state- of-the-art models in both depth accuracy and robustness. This demonstrates the potential of our method to reduce visual errors in complex endoscopic environments. Our code will be re- leased at https://github.com/yinheng-lin/EndoDDC.

Index terms

Computer Vision for Medical Robotics Surgical Robotics: Laparoscopy Vision-Based Navigation