← Back ICRA 2026

Scaling Multi-Agent Reinforcement Learning for Underwater Acoustic Tracking Via Autonomous Vehicles

Matteo Gallici, Ivan Masmitja, Mario Martin

PDF

AI summary

Key figure (auto-extracted from paper)

A GPU-accelerated training pipeline combined with transformer-based policies enables scalable, fleet-size-invariant multi-agent reinforcement learning for precise underwater acoustic tracking.

Multi-Agent Reinforcement Learning Underwater Acoustic Tracking GPU-Accelerated Simulation Transformer Policies Curriculum Learning Autonomous Vehicles

Problem

Scaling multi-agent reinforcement learning for underwater tracking is hindered by the sample inefficiency of MARL and the extreme computational cost of high-fidelity simulators, while existing methods fail to generalize across varying fleet and target sizes.

Approach

The authors introduce a GPU-vectorized simplified environment for rapid training alongside a high-fidelity simulator for evaluation, paired with a Transformer-based architecture that learns policies invariant to the number of agents and targets via curriculum learning.

Key results

Achieves up to 30,000× training speedup over Gazebo
Policies remain invariant to fleet size and target count
Tracks 5 fast-moving targets with only 5 vehicles
Maintains tracking errors below 5m in high-fidelity simulation

Why it matters

Provides a scalable, computationally efficient framework for training cooperative autonomous vehicle fleets, accelerating progress in marine monitoring and underwater research.

Abstract

Autonomous vehicles (AVs) o!er a cost-e!ective solution for scientific missions such as underwater tracking. Reinforcement learning (RL) has emerged as a powerful method for controlling AVs, but scaling to fleets (essential for multi- target tracking or rapidly moving targets) is challenging. Multi- Agent RL (MARL) is notoriously sample-ine”cient, and while high-fidelity simulators like Gazebo’s LRAUV provide up to 100× faster-than-real-time single-robot simulations, they o!er little speedup in multi-vehicle scenarios, making MARL train- ing impractical. Yet, high-fidelity simulation is crucial to test complex policies and close the sim-to-real gap. To address these limitations, we develop a GPU-accelerated environment that achieves up to 30,000× speedup over Gazebo while preserving its dynamics. This enables fast, end-to-end GPU training and seamless transfer to Gazebo for evaluation. We also introduce a Transformer-based architecture (TransfMAPPO) that learns policies invariant to fleet size and number of targets, enabling curriculum learning to train larger fleets on increasingly complex scenarios. After large-scale GPU training, we perform extensive evaluations in Gazebo, showing our method maintains tracking errors below 5m even with multiple fast-moving targets.

Index terms

Marine Robotics Path Planning for Multiple Mobile Robots or Agents Reinforcement Learning