Behavior Cloning-Enhanced Deep Reinforcement Learning for Robot Navigation Via Dynamic Reward and Adaptive Replanning
Jianning Chi, fusheng li, Wenjun Zhang, Yongming Yang
AI summary
Problem
Existing BC-enhanced DRL methods suffer from fixed imitation constraints that suppress late-stage exploration and goal-obstacle task conflicts that cause robot stagnation.
Approach
ADRL dynamically fuses TD3 with BC using a value-driven imitation scheduling mechanism, a phase-aligned dynamic reward function, and a lightweight adaptive replanning mechanism to smoothly transition from imitation to exploration.
Key results
- Value-driven stage-aware imitation scheduling mechanism
- Phase-aligned dynamic weight composite reward function
- Lightweight adaptive replanning mechanism for stagnation resolution
- Superior convergence speed, success rate, and robustness over TD3+BC, PPO, and SAC
Why it matters
Enables efficient and reliable autonomous navigation for mobile robots by resolving the stabilization-exploration dilemma inherent in fixed BC-enhanced DRL.
Abstract
Deep reinforcement learning (DRL) is a core tech- nology for mobile robot navigation in diverse environments, yet existing Behavior Cloning (BC)-enhanced DRL methods suffer two critical challenges: fixed imitation constraints suppress autonomous exploration in late training stages despite stabi- lizing early learning, and goal-obstacle avoidance task conflicts impede robust action selection during navigation. To address these issues, this paper proposes an Adaptive Strategy Deep Reinforcement Learning (ADRL) method, which reformulates BC as a progressively released transitional constraint and builds a stage-aware transition framework for robot navigation. Specifically, ADRL dynamically fuses Twin Delayed Deep De- terministic Policy Gradient (TD3) with BC via a value-driven imitation scheduling mechanism, which adaptively modulates the expert-online data mixing ratio and BC regularization strength based on critic feedback to accelerate convergence and realize a smooth shift from imitation-dominant to exploration- driven learning. A phase-aligned dynamic weight composite reward function is designed, which embeds motion constraints and stage-aware priority adjustment to mitigate reward sparsity and align learning objectives with policy maturity. Additionally, a lightweight adaptive replanning mechanism is developed as an evaluation stabilizer, which generates obstacle-avoiding waypoints by obstacle density when the robot stagnates, re- solving goal-obstacle avoidance conflicts without altering the transition-centric learning objective. Multi-scenario experimen- tal results demonstrate that ADRL outperforms state-of-the- art methods in training convergence speed, navigation success rate and robustness under identical training budgets. This method provides a principled integration strategy for imitation and reinforcement learning in robot navigation, and lays a solid foundation for building efficient and reliable autonomous navigation systems.