Learning Behaviours for Decentralised Multi-Robot Collision Avoidance in Constrained Pathways Using Curriculum Reinforcement Learning
Md Mostafizur Rahman Komol, Brendan Tidd, Will Browne, Frederic Maire, Jason Williams, David Howard
AI summary
Problem
Decentralized multi-robot navigation in narrow, communication-limited bottlenecks often leads to collisions because traditional methods ignore dynamic agent interactions and end-to-end reinforcement learning converges too slowly.
Approach
The method uses curriculum reinforcement learning to gradually train robots, starting with simple pre-programmed yielding rules and progressively narrowing the gap width to learn optimal decentralized collision-avoidance behaviors.
Key results
- 99% success rate in high-fidelity simulation without inter-agent communication
- 73% success in noisy sensor simulations and 60% in real-world field tests
- Outperformed end-to-end RL, Hybrid A*, and rule-based benchmarks
- Generated unanticipated cooperative interaction behaviors beyond initial programming
Why it matters
Enables reliable autonomous multi-robot coordination in communication-denied, high-risk environments like search-and-rescue and mining, reducing reliance on manual programming and complex communication infrastructure.
Abstract
Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited commu- nication environments (e.g., underground search-and-rescue op- erations). Existing navigation approaches exhibit suboptimal per- formance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigat- ing through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorpo- rating a multi-robot bootstrap curriculum with preprogrammed behaviour to guide initial policy formation, subsequently refined by a gap curriculum that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot in- teraction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incor- porating noisy sensors, and 60% in field-robot tests, substantiating our model’s practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.