← Back ICRA 2026

Robust Robot Navigation through Failure-Aversion Learning

Zhifeng Yu, Xuyang Li, Jianwu Fang, Guangliang Li, Jianru Xue

PDF

AI summary

Key figure (auto-extracted from paper)

DFPS-Nav achieves up to 29.5% higher success rates in complex navigation by systematically extracting recovery behaviors from failures and prioritizing high-quality successes.

Robot navigation reinforcement learning failure-aversion learning self-imitation learning sample efficiency autonomous agents

Problem

On-policy reinforcement learning for robot navigation suffers from severe sample inefficiency and discards valuable learning signals contained in failed trajectories. Existing failure-learning methods analyze failures point-wise, ignoring the rich sequential context necessary for robust recovery.

Approach

The authors propose DFPS-Nav, which asymmetrically processes experiences by applying trajectory-level quality prioritization to successes and segment-wise trend analysis to failures to identify critical recovery and avoidance actions.

Key results

Up to 29.5% higher success rate in static environments
Up to 27% higher success rate in dynamic environments
Successful real-world deployment on a physical mobile robot
Improved path efficiency, collision avoidance, and training convergence

Why it matters

Enables more data-efficient and robust autonomous navigation for real-world robots by transforming failed attempts into actionable learning signals.

Abstract

Autonomous navigation in complex dynamic envi- ronments remains a fundamental challenge in robotics, and many reinforcement learning (RL) algorithms have demon- strated promising results, especially the on-policy ones. How- ever, the inherent sample efficiency issue is still a funda- mental problem to be solved. Methods integrating off-policy approaches into on-policy frameworks have been proposed to improve the sample efficiency by focusing on imitating the agent’s past exemplary experiences while discarding less opti- mal ones. However, these methods overlook the valuable insights embedded within failures. Although some research has begun to explore learning from failures, it is usually done at a point-by- point level, ignoring the rich sequence context inherent in the trajectory. In this paper, we introduce DFPS-Nav, a training framework that utilizes Failure-Aversion Learning (FAL) to perform segmented, trend-based credit assignment, identifying both failure-inducing actions and valuable recovery behaviors within failed trajectories. We further improve successful imi- tation by adopting Prioritized Self-Imitation Learning (PSIL), which scores trajectories and prioritizes high-quality behaviors so that successful behaviors are reliably reproduced. Extensive simulation and real-world experiments demonstrate that using both FAL and PSIL to extract and refine information from the sequential context within trajectories, DFPS-Nav achieves up to 29.5% and 27% higher success rates in static and dynamic environments compared to the strong baseline method and is successfully applied in the real world. This work underscores how systematically deconstructing failures while prioritizing successes leads to more efficient and robust autonomous navi- gation.

Index terms

Reinforcement Learning Learning from Experience Integrated Planning and Control