← Back ICRA 2026

Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

Nicolo Dal Fabbro, Milad Mesbahi, Mendes Renato, João Sousa, George J. Pappas

PDF

AI summary

Key figure (auto-extracted from paper)

A multi-agent reinforcement learning framework with energy-aware speed control and intermittent communication enables accurate, long-duration mapping of dynamic river plumes, outperforming baselines and scaling endurance with fleet size.

River plume monitoring Multi-agent reinforcement learning AUV coordination Gaussian process regression Energy-aware navigation Coastal robotics

Problem

Long-term monitoring of rapidly evolving coastal river plumes is hindered by strong ocean currents, limited AUV energy, and constrained underwater communication, making traditional fixed or short-term strategies ineffective.

Approach

The authors propose a centralized multi-agent reinforcement learning system that uses Gaussian process regression to estimate salinity fields and a multi-head Q-network to intermittently command AUV direction and speed, optimizing the trade-off between mapping accuracy and energy efficiency.

Key results

Outperforms single- and multi-agent baselines in mapping accuracy
Doubles mission endurance while maintaining accuracy when scaling from 3 to 6 AUVs
Generalizes across unseen seasonal regimes and multiple years
Achieves energy-efficient long-term monitoring through adaptive speed control and minimal communication

Why it matters

Enables reliable, long-duration coastal monitoring for environmental management and pollution tracking in dynamic estuarine environments.

Abstract

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river repre- sentative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.

Index terms

Control Architectures and Programming Energy and Environment-Aware Automation Marine Robotics