← Back ICRA 2026

DREAM: Domain-Aware Reasoning for Efficient Autonomous Underwater Monitoring

Zhenqi Wu, Abhinav Modi, Angelos Mavrogiannis, Kaustubh Joshi, Nikhil Chopra, Yiannis Aloimonos, Nare Karapetyan, Ioannis Rekleitis, Xiaomin Lin

PDF

AI summary

Key figure (auto-extracted from paper)

DREAM, a VLM-guided autonomy framework integrating occupancy maps and chain-of-thought reasoning, significantly outperforms existing baselines in underwater exploration efficiency and coverage.

Vision Language Models Autonomous Underwater Vehicles Chain-of-Thought Reasoning Occupancy Mapping Benthic Monitoring Robotic Exploration

Problem

Existing underwater robotic systems lack the real-time reasoning, persistent spatial memory, and adaptive decision-making required for safe, efficient, long-term autonomous benthic monitoring.

Approach

The framework couples a Vision Language Model with chain-of-thought reasoning and an incrementally updated occupancy map to guide high-level exploration planning, which is executed by a low-level robotic controller.

Key results

98.3% oyster coverage with 31.5% fewer steps than UIVNAV baseline
100% shipwreck coverage with 27.5% fewer steps than vanilla VLM
Real-world deployment on a BlueROV demonstrating feasible reef surveying
Open-sourced synthetic environments, dataset, and codebase for underwater monitoring

Why it matters

Enables safer, cost-effective long-term ocean ecosystem monitoring and benthic mapping, benefiting marine scientists and autonomous robotics researchers.

Abstract

The ocean is warming and acidifying, increasing the risk of mass mortality events for temperature-sensitive shellfish such as oysters. This motivates the development of long-term monitoring systems. However, human labor is costly and long-duration underwater work is highly hazardous, thus favoring robotic solutions as a safer and more efficient option. Yet deploying such robots for persistent, wide-area benthic monitoring demands real-time, environment-aware decision-making without human intervention, a capability that existing systems still lack. To this end, we present DREAM, a Vision Language Model (VLM)-guided autonomy framework for long-term underwater exploration and monitoring. It autonomously explores the seafloor, detects and localizes objects of interest such as oyster clusters, and builds a spatial coverage map of their distribution. DREAM couples (i) a reasoning-augmented prompt that guides VLM planning with (ii) an occupancy map providing memory and overview, and (iii) a low-level controller to realize actions. Our framework outperforms all baselines across both tasks: in oyster monitoring, it uses 23.0% fewer steps and covers 8.9% more oysters than the vanilla VLM, and completes the task 31.5% faster than UIVNAV. In shipwreck exploration, it achieves 100% coverage versus 60.2% for the vanilla model with 27.5% fewer steps. All code and prompts can be found at https://github.com/zhenqi72/DREAM.

Index terms

Marine Robotics Perception-Action Coupling Environment Monitoring and Management