← Back ICRA 2026

NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

Haolin Yang, Yuxing Long, Zhuoyuan Yu, Zihan Yang, Minghan Wang, Jiapeng Xu, Yihan Wang, Ziyan Yu, Wenzhe Cai, Lei Kang, Hao Dong

PDF

AI summary

Key figure (auto-extracted from paper)

Current navigation agents and multimodal LLMs lack robust spatial intelligence, but the proposed SNav model significantly closes this gap and establishes a new baseline for spatially aware embodied navigation.

Spatial intelligence instruction navigation benchmark embodied AI multimodal LLMs SNav

Problem

Prior benchmarks focus on semantic understanding but overlook systematic evaluation of spatial perception and reasoning in instruction-following navigation, leaving the spatial capabilities of agents unclear.

Approach

The authors introduce NavSpace, a benchmark with 1,228 trajectory-instruction pairs across six spatial intelligence categories, and propose SNav, a navigation large model fine-tuned with spatially intelligent instructions to enhance spatial perception and reasoning.

Key results

First benchmark evaluating spatial intelligence across six navigation categories
Comprehensive evaluation of 22 agents revealing MLLM limitations in spatial tasks
Development of SNav, a spatially intelligent navigation model that outperforms existing agents
Strong baseline performance on NavSpace and real robot tests

Why it matters

Provides a critical evaluation framework and improved model for developing spatially aware robots and AI agents in real-world indoor environments.

Abstract

Instruction-following navigation is a key step toward embodied intelligence. Prior benchmarks mainly focus on semantic understanding but overlook systematically evaluating navigation agents’ spatial perception and reasoning capabilities. In this work, we introduce the NavSpace benchmark, which contains six task categories and 1,228 trajectory–instruction pairs designed to probe the spatial intelligence of navigation agents. On this benchmark, we comprehensively evaluate 22 navigation agents, including state-of-the-art navigation models and multimodal large language models. The evaluation results lift the veil on spatial intelligence in embodied navigation. Furthermore, we propose SNav, a new spatially intelligent navigation model. SNav outperforms existing navigation agents on NavSpace and real robot tests, establishing a strong baseline for future work.

Index terms

Vision-Based Navigation