← Back ICRA 2026

PersONAL: Towards a Comprehensive Benchmark for Personalized Embodied Agents

Filippo Ziliotto, Jelin Raphael Akkara, Alessandro Daniele, Lamberto Ballan, Luciano Serafini, Tommaso Campari

PDF

AI summary

Key figure (auto-extracted from paper)

Current state-of-the-art embodied AI agents still show a substantial performance gap compared to humans in personalized, user-centric navigation and object grounding tasks.

Embodied AI Personalized Agents Object Navigation Object Grounding Benchmark HM3D

Problem

Embodied AI agents struggle to interpret and act on user-specific preferences and object ownership in realistic environments, as existing benchmarks lack rigorous personalization and rely on static or image-based cues.

Approach

The authors introduce PersONAL, a benchmark with over 2,000 episodes across 30+ photorealistic homes, where agents must use textual scene descriptions and ownership metadata to navigate to or ground user-specific objects.

Key results

Released a dataset of 2,000+ high-quality episodes across 30+ HM3D homes with three difficulty levels
Defined two evaluation modes: active navigation in unseen environments and object grounding in mapped scenes
Demonstrated a substantial performance gap between state-of-the-art zero-shot baselines and human-level performance
Improved caption quality and lexical diversity over prior benchmarks like GOAT-Bench

Why it matters

It provides a crucial evaluation framework for developing real-world assistive robots that can understand and act on personalized human preferences in domestic settings.

Abstract

Recent advances in Embodied AI have enabled agents to perform increasingly complex tasks and adapt to diverse environments. However, deploying such agents in re- alistic human-centered scenarios, such as domestic households, remains challenging, particularly due to the difficulty of model- ing individual human preferences and behaviors. In this work, we introduce PersONAL (PERSonalized Object Navigation And Localization), a comprehensive benchmark designed to study personalization in Embodied AI. Agents must identify, retrieve, and navigate to objects associ- ated with specific users, responding to natural-language queries such as find Lily’s backpack. PersONAL comprises over 2,000 high-quality episodes across 30+ photorealistic homes from the HM3D dataset. Each episode includes a natural-language scene description with explicit associations between objects and their owners, requiring agents to reason over user-specific semantics. The benchmark supports two evaluation modes: (1) active navigation in unseen environments, and (2) object grounding in previously mapped scenes. Experiments with state-of-the- art baselines reveal a substantial gap to human performance, highlighting the need for embodied agents capable of perceiving, reasoning, and memorizing over personalized information; paving the way towards real-world assistive robot. Code and dataset available at: github.io/PersONAL

Index terms

Human-Centered Robotics Autonomous Agents Data Sets for Robotic Vision