Research Analyzer
← Back ICRA 2026

X-Nav: Learning End-To-End Cross-Embodiment Navigation for Mobile Robots

Haitong Wang, Aaron Hao Tan, Angus Fung, Goldie Nejat

PDF

AI summary

Key figure (auto-extracted from paper)
A single unified navigation policy trained on randomly generated robots achieves zero-shot deployment across diverse wheeled and quadrupedal platforms without embodiment-specific tuning.
cross-embodiment navigation end-to-end learning deep reinforcement learning policy distillation transformer networks zero-shot transfer

Problem

Existing navigation methods rely on embodiment-specific kinematics, dynamics, or controllers, preventing policies trained on one robot from generalizing to others. This limits scalability and requires costly manual tuning or separate planners for each new platform.

Approach

X-Nav employs a two-stage framework that first trains multiple expert policies via deep reinforcement learning on randomly generated robot embodiments, then distills their knowledge into a single transformer-based policy that maps visual and proprioceptive inputs directly to low-level control commands.

Key results

  • Zero-shot transfer to unseen wheeled and quadrupedal robot embodiments
  • Successful navigation in photorealistic simulated and real-world environments
  • Performance scales positively with the number of randomly generated training embodiments
  • Ablation study confirms the effectiveness of the Nav-ACT transformer and distillation pipeline

Why it matters

Enables developers and researchers to deploy a single navigation model across diverse mobile robot platforms without costly embodiment-specific tuning or retraining.

Abstract

Existing navigation methods are primarily designed for specific robot embodiments, limiting their generalizability across diverse robot platforms. In this paper, we introduce X-Nav, a novel framework for end-to-end cross-embodiment navigation where a single unified policy can be deployed across various embodiments for both wheeled and quadrupedal robots. X-Nav consists of two learning stages: 1) multiple expert policies are trained using deep reinforcement learning with privileged observations on a wide range of randomly generated robot embodiments; and 2) a single general policy is distilled from the expert policies via navigation action chunking with transformer (Nav-ACT). The general policy directly maps visual and proprioceptive observations to low-level control commands, enabling generalization to novel robot embodiments. Simulated experiments demonstrated that X-Nav achieved zero-shot transfer to both unseen embodiments and photorealistic environments. A scalability study showed that the performance of X-Nav improves when trained with an increasing number of randomly generated embodiments. An ablation study confirmed the design choices of X-Nav. Furthermore, real-world experiments were conducted to validate the generalizability of X-Nav in real-world environments.

Index terms

Vision-Based Navigation Sensorimotor Learning AI-Enabled Robotics

Related papers