Autonomous Search for Sparsely Distributed Visual Phenomena through Environmental Context Modeling
Eric Chen, Travis Manderson, Nare Karapetyan, Peter Edmunds, Nicholas Roy, Yogesh Girdhar
AI summary
Problem
Autonomous underwater vehicles struggle to efficiently locate sparsely distributed target species on coral reefs because traditional fixed-path surveys waste battery life, and rare target detections provide no directional guidance for adaptive planning.
Approach
The method uses a single labeled image to initialize a one-shot DINOv2 detector that identifies both the target species and its surrounding habitat features, then leverages the denser context signal to guide adaptive AUV movement.
Key results
- One-shot detection of three diverse coral species from a single labeled image using patch-level DINOv2 embeddings
- Online characterization of visual environmental context that provides a denser, smoother exploration signal than target detections alone
- Context-guided exploration samples up to 75% of target corals in roughly half the time of exhaustive lawnmower coverage
- Outperforms baselines using only target detections or manually defined substrata segmentation for adaptive surveying
Why it matters
Enables marine biologists and AUV operators to conduct faster, more efficient reef surveys with limited battery life, improving coral monitoring and conservation efforts.
Abstract
Autonomous underwater vehicles (AUVs) are in- creasingly used to survey coral reefs, yet efficiently locating specific coral species of interest remains difficult: target species are often sparsely distributed across the reef, and an AUV with limited battery life cannot afford to search everywhere. When detections of the target itself are too sparse to provide directional guidance, the robot benefits from an additional signal to decide where to look next. We propose using the visual environmental context – the habitat features that tend to co- occur with a target species – as that signal. Because context features are spatially denser and often vary more smoothly than target detections, we hypothesize that a reward function targeted at broader environmental context will enable adaptive planners to make better decisions on where to go next, even in regions where no target has yet been observed. Starting from a single labeled image, our method uses patch-level DINOv2 embeddings to perform one-shot detections of both the target species and its surrounding context online. We validate our approach using real imagery collected by an AUV at two reef sites in St. John, U.S. Virgin Islands, simulating the robot’s motion offline. Our results demonstrate that one-shot detection combined with adaptive context modeling enables efficient autonomous surveying, sampling up to 75% of the target in roughly half the time required by exhaustive coverage when the target is sparsely distributed, and outperforming search strategies that only use target detections.