← Back ICRA 2026

Learning Behaviours for Decentralised Multi-Robot Collision Avoidance in Constrained Pathways Using Curriculum Reinforcement Learning

Md Mostafizur Rahman Komol, Brendan Tidd, Will Browne, Frederic Maire, Jason Williams, David Howard

PDF

AI summary

Key figure (auto-extracted from paper)

A curriculum reinforcement learning framework successfully trains decentralized robots to yield and navigate narrow gaps without communication, achieving 99% success in simulation and demonstrating practical viability in real-world tests.

Multi-robot systems reinforcement learning collision avoidance curriculum learning decentralized navigation field robotics

Problem

Decentralized multi-robot navigation in narrow, communication-limited bottlenecks often leads to collisions because traditional methods ignore dynamic agent interactions and end-to-end reinforcement learning converges too slowly.

Approach

The method uses curriculum reinforcement learning to gradually train robots, starting with simple pre-programmed yielding rules and progressively narrowing the gap width to learn optimal decentralized collision-avoidance behaviors.

Key results

99% success rate in high-fidelity simulation without inter-agent communication
73% success in noisy sensor simulations and 60% in real-world field tests
Outperformed end-to-end RL, Hybrid A*, and rule-based benchmarks
Generated unanticipated cooperative interaction behaviors beyond initial programming

Why it matters

Enables reliable autonomous multi-robot coordination in communication-denied, high-risk environments like search-and-rescue and mining, reducing reliance on manual programming and complex communication infrastructure.

Abstract

Mobile robot teams often require decentralised autonomous navigation through narrow gaps in limited commu- nication environments (e.g., underground search-and-rescue op- erations). Existing navigation approaches exhibit suboptimal per- formance for avoiding multi-robot collisions in such bottlenecks due to an inability to address the dynamic nature of the robots. Initial work utilising reinforcement learning has demonstrated success in navigating a single robot through narrow gaps. However, when training agents to produce give-way behaviour for navigat- ing through constrained gaps, end-to-end reinforcement learning using simple rewards suffers from slow convergence due to the increased search space of viable policies. This paper introduces a novel curriculum reinforcement learning framework, incorpo- rating a multi-robot bootstrap curriculum with preprogrammed behaviour to guide initial policy formation, subsequently refined by a gap curriculum that progressively reduces training complexity towards an optimal policy. This framework learns multi-robot in- teraction behaviours, which are impractical to program manually. Our model achieves a 99% success-rate in give-way behaviour generation without inter-agent communications in high-fidelity simulations. The success-rate reduced to 73% in simulations incor- porating noisy sensors, and 60% in field-robot tests, substantiating our model’s practical viability despite sensor noise and real-world uncertainties. The simple benchmark methods lack efficiency in basic interaction behaviours.

Index terms

Field Robots Search and Rescue Robots Reinforcement Learning