← Back ICRA 2026

Efficient Construction of Implicit Surface Models from a Single Image for Motion Generation

Wei-Teng Chu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi

PDF

AI summary

Key figure (auto-extracted from paper)

FINS reconstructs high-fidelity 3D surfaces and signed distance fields from a single image in seconds, enabling real-time robotic motion generation.

Neural implicit surfaces Signed distance fields Single-view reconstruction Robot motion generation Hash grid encoding Foundation models

Problem

Existing neural implicit surface methods require dense multi-view images and lengthy training times, making them impractical for real-time robotics applications with sparse observations.

Approach

FINS uses pre-trained 3D foundation models to generate point cloud supervision from a single image, paired with a multi-resolution hash grid encoder and a staged hybrid optimizer for rapid convergence.

Key results

Achieves high-precision SDF training from a single image in ~10 seconds
Leverages 3D foundation models for effective single-view supervision
Outperforms state-of-the-art baselines in convergence speed and reconstruction accuracy
Demonstrates successful robot surface following and scalability across benchmarks

Why it matters

Enables real-time, sparse-view 3D reconstruction for downstream robotics tasks like obstacle avoidance, path planning, and surface inspection.

Abstract

Implicit representations have been widely applied in robotics for obstacle avoidance and path planning. In this paper, we explore the problem of constructing an implicit distance representation from a single image. Past methods for implicit surface reconstruction, such as NeuS and its variants generally require a large set of multi-view images as input, and require long training times. In this work, we propose Fast Image-to-Neural Surface (FINS), a lightweight framework that can reconstruct high-fidelity surfaces and SDF fields based on a single or a small set of images. FINS integrates a multi- resolution hash grid encoder with lightweight geometry and color heads, making the training via an approximate second- order optimizer highly efficient and capable of converging within a few seconds. Additionally, we achieve the construction of a neural surface requiring only a single RGB image, by leveraging pre-trained foundation models to estimate the geometry inherent in the image. Our experiments demonstrate that under the same conditions, our method outperforms state- of-the-art baselines in both convergence speed and accuracy on surface reconstruction and SDF field estimation. Moreover, we demonstrate the applicability of FINS for robot surface follow- ing tasks and show its scalability to a variety of benchmark datasets. Code is publicly available at https://github. com/waynechu1109/FINS

Index terms

Deep Learning for Visual Perception