Robust Robot Navigation through Failure-Aversion Learning
Zhifeng Yu, Xuyang Li, Jianwu Fang, Guangliang Li, Jianru Xue
AI summary
Problem
On-policy reinforcement learning for robot navigation suffers from severe sample inefficiency and discards valuable learning signals contained in failed trajectories. Existing failure-learning methods analyze failures point-wise, ignoring the rich sequential context necessary for robust recovery.
Approach
The authors propose DFPS-Nav, which asymmetrically processes experiences by applying trajectory-level quality prioritization to successes and segment-wise trend analysis to failures to identify critical recovery and avoidance actions.
Key results
- Up to 29.5% higher success rate in static environments
- Up to 27% higher success rate in dynamic environments
- Successful real-world deployment on a physical mobile robot
- Improved path efficiency, collision avoidance, and training convergence
Why it matters
Enables more data-efficient and robust autonomous navigation for real-world robots by transforming failed attempts into actionable learning signals.
Abstract
Autonomous navigation in complex dynamic envi- ronments remains a fundamental challenge in robotics, and many reinforcement learning (RL) algorithms have demon- strated promising results, especially the on-policy ones. How- ever, the inherent sample efficiency issue is still a funda- mental problem to be solved. Methods integrating off-policy approaches into on-policy frameworks have been proposed to improve the sample efficiency by focusing on imitating the agent’s past exemplary experiences while discarding less opti- mal ones. However, these methods overlook the valuable insights embedded within failures. Although some research has begun to explore learning from failures, it is usually done at a point-by- point level, ignoring the rich sequence context inherent in the trajectory. In this paper, we introduce DFPS-Nav, a training framework that utilizes Failure-Aversion Learning (FAL) to perform segmented, trend-based credit assignment, identifying both failure-inducing actions and valuable recovery behaviors within failed trajectories. We further improve successful imi- tation by adopting Prioritized Self-Imitation Learning (PSIL), which scores trajectories and prioritizes high-quality behaviors so that successful behaviors are reliably reproduced. Extensive simulation and real-world experiments demonstrate that using both FAL and PSIL to extract and refine information from the sequential context within trajectories, DFPS-Nav achieves up to 29.5% and 27% higher success rates in static and dynamic environments compared to the strong baseline method and is successfully applied in the real world. This work underscores how systematically deconstructing failures while prioritizing successes leads to more efficient and robust autonomous navi- gation.