← Back ICRA 2026

Depth Completion in Unseen Field Robotics Environments Using Extremely Sparse Depth Measurements

Marco Job, Thomas Stastny, Eleni Kelasidi, Roland Siegwart, Michael Pantic

PDF

AI summary

Key figure (auto-extracted from paper)

A real-time depth completion model trained on synthetic data successfully predicts dense metric depth from extremely sparse measurements across diverse, unseen field robotics environments.

Depth completion field robotics monocular depth estimation synthetic data generation sparse depth real-time perception

Problem

Monocular depth estimation fails in unstructured field robotics due to missing scale cues and low-texture conditions, while existing depth completion methods require denser sensor data or lack generalization to unseen real-world scenarios.

Approach

The method extends a state-of-the-art monocular depth network by adding a fourth channel for sparse depth inputs and trains it on photorealistic synthetic data generated from real-world 3D meshes.

Key results

53 ms end-to-end latency on embedded Jetson hardware
Competitive dense depth accuracy across five unseen real-world field robotics datasets
Novel synthetic dataset generation pipeline using SfM meshes and novel viewpoint synthesis
Open release of four synthetic training datasets, code, and pre-trained models

Why it matters

Empowers autonomous field robots to achieve reliable, real-time 3D perception using low-cost or degraded sensors in previously inaccessible environments.

Abstract

Autonomous field robots operating in unstruc- tured environments require robust perception to ensure safe and reliable operations. Recent advances in monocular depth estimation have demonstrated the potential of low-cost cameras as depth sensors; however, their adoption in field robotics remains limited due to the absence of reliable scale cues, ambiguous or low-texture conditions, and the scarcity of large- scale datasets. To address these challenges, we propose a depth completion model that trains on synthetic data and uses extremely sparse measurements from depth sensors to predict dense metric depth in unseen field robotics environments. A synthetic dataset generation pipeline tailored to field robotics enables the creation of multiple realistic datasets for training purposes. This dataset generation approach utilizes textured 3D meshes from Structure from Motion and photorealistic rendering with novel viewpoint synthesis to simulate diverse field robotics scenarios. Our approach achieves an end-to- end latency of 53 ms per frame on an Nvidia Jetson AGX Orin, enabling real-time deployment on embedded platforms. Extensive evaluation demonstrates competitive performance across diverse real-world field robotics scenarios.

Index terms

Field Robots Data Sets for Robotic Vision Deep Learning for Visual Perception