← Back ICRA 2026

UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene

Christian Maurer, Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki

PDF

AI summary

Key figure (auto-extracted from paper)

UniFField enables robots to accurately predict visual, semantic, and geometric features along with their uncertainties in any new scene without per-scene training, enabling robust decision-making for active exploration.

Neural feature fields uncertainty quantification 3D scene understanding robotic perception zero-shot generalization active exploration

Problem

Existing 3D neural feature fields are typically scene-specific and lack the ability to model prediction uncertainty, which hinders robust robotic perception and decision-making in unstructured or partially observable environments.

Approach

The authors introduce UniFField, a generalizable voxel-based neural feature field that incrementally aggregates RGB-D observations to jointly predict visual, semantic, and geometric properties while quantifying their uncertainties using input-derived indicators and heteroscedastic loss.

Key results

Zero-shot generalization to unseen scenes without per-scene optimization
Uncertainty estimates that closely align with actual model prediction errors across all modalities
Incremental voxel-based architecture enabling real-time scene updates during robot exploration
Successful uncertainty-aware active object search demonstrated on a mobile manipulator robot

Why it matters

It provides a reliable, uncertainty-aware 3D perception foundation that allows robots to safely and effectively operate in unknown, dynamic environments where traditional methods fail.

Abstract

Comprehensive visual, geometric and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex envi- ronments. Additionally, to make robust decisions it is necessary for the robot to evaluate the reliability of perceived information. While recent advances in 3D neural feature fields have enabled robots to leverage features from pretrained foundation models for tasks such as language-guided manipulation and navigation, existing methods suffer from two critical limitations: (i) they are typically scene-specific, and (ii) they lack the ability to model uncertainty in their predictions. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generaliz- able representation while also predicting uncertainty in each modality. Our approach, which can be applied zero shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and seman- tic feature prediction. Furthermore, we successfully leverage spatial and semantic feature predictions and their respective uncertainty for an active object search task using a mobile manipulator robot, demonstrating the capability for robust decision-making. - Research funded by EU Horizon program under grant no. 101120823, project MANiBOT. Support and HPC resources provided by Erlangen Na- tional High Performance Computing Center (NHR) of Friedrich-Alexander- Universit ̈at Erlangen-N ̈urnberg (FAU), funded by federal and Bavarian authorities and the German Research Foundation (DFG) – 440719683. All authors are with the Computer Science Department, Technische Universit ̈at Darmstadt, Germany: {christian.maurer, snehal.jauhri, sophie.lueth}@tu-darmstadt.de, georgia.chalvatzaki@tu-darmstadt.de

Index terms

Deep Learning for Visual Perception Computer Vision for Automation RGB-D Perception