← Back ICRA 2026

Behavior Cloning-Enhanced Deep Reinforcement Learning for Robot Navigation Via Dynamic Reward and Adaptive Replanning

Jianning Chi, fusheng li, Wenjun Zhang, Yongming Yang

PDF

AI summary

Key figure (auto-extracted from paper)

ADRL dynamically balances expert imitation and autonomous exploration based on training progress, significantly accelerating convergence and improving navigation robustness over fixed BC-enhanced DRL methods.

Deep Reinforcement Learning Behavior Cloning Robot Navigation Adaptive Replanning Dynamic Reward Policy Transition

Problem

Existing BC-enhanced DRL methods suffer from fixed imitation constraints that suppress late-stage exploration and goal-obstacle task conflicts that cause robot stagnation.

Approach

ADRL dynamically fuses TD3 with BC using a value-driven imitation scheduling mechanism, a phase-aligned dynamic reward function, and a lightweight adaptive replanning mechanism to smoothly transition from imitation to exploration.

Key results

Value-driven stage-aware imitation scheduling mechanism
Phase-aligned dynamic weight composite reward function
Lightweight adaptive replanning mechanism for stagnation resolution
Superior convergence speed, success rate, and robustness over TD3+BC, PPO, and SAC

Why it matters

Enables efficient and reliable autonomous navigation for mobile robots by resolving the stabilization-exploration dilemma inherent in fixed BC-enhanced DRL.

Abstract

Deep reinforcement learning (DRL) is a core tech- nology for mobile robot navigation in diverse environments, yet existing Behavior Cloning (BC)-enhanced DRL methods suffer two critical challenges: fixed imitation constraints suppress autonomous exploration in late training stages despite stabi- lizing early learning, and goal-obstacle avoidance task conflicts impede robust action selection during navigation. To address these issues, this paper proposes an Adaptive Strategy Deep Reinforcement Learning (ADRL) method, which reformulates BC as a progressively released transitional constraint and builds a stage-aware transition framework for robot navigation. Specifically, ADRL dynamically fuses Twin Delayed Deep De- terministic Policy Gradient (TD3) with BC via a value-driven imitation scheduling mechanism, which adaptively modulates the expert-online data mixing ratio and BC regularization strength based on critic feedback to accelerate convergence and realize a smooth shift from imitation-dominant to exploration- driven learning. A phase-aligned dynamic weight composite reward function is designed, which embeds motion constraints and stage-aware priority adjustment to mitigate reward sparsity and align learning objectives with policy maturity. Additionally, a lightweight adaptive replanning mechanism is developed as an evaluation stabilizer, which generates obstacle-avoiding waypoints by obstacle density when the robot stagnates, re- solving goal-obstacle avoidance conflicts without altering the transition-centric learning objective. Multi-scenario experimen- tal results demonstrate that ADRL outperforms state-of-the- art methods in training convergence speed, navigation success rate and robustness under identical training budgets. This method provides a principled integration strategy for imitation and reinforcement learning in robot navigation, and lays a solid foundation for building efficient and reliable autonomous navigation systems.

Index terms

Foundations of Automation Planning Scheduling and Coordination Discrete Event Dynamic Automation Systems