Knowledge Optical to Sonar (KnOTS): Towards the Transfer of Knowledge of Underwater Object Detection from Optical to Forward-Looking Sonar Imagery
Caroline Keenan, Ella Wawrzynek, David Whelihan, Ivy Mahncke, John Leonard, Madeline Miller
AI summary
Problem
Underwater object detection for autonomous vehicles is hindered by the scarcity of labeled sonar data and the environmental limitations of optical cameras. Manual annotation of sonar imagery is labor-intensive and requires specialized expertise, creating a bottleneck for model development.
Approach
The method co-mounts an optical camera and forward-looking sonar on an AUV to capture synchronized imagery. It trains a vision model on optical images, extracts object azimuth boundaries, and maps them to preprocessed sonar data using connected component analysis to automatically generate bounding box labels.
Key results
- 0.985 mAP50 on optical images with minimal training data
- Automatic generation of sonar bounding boxes without manual labeling
- Successful YOLOv11 training on automatically labeled sonar imagery
- Real-time processing at 12 image pairs per second on an embedded AUV
Why it matters
Provides a scalable, annotation-free pipeline for training robust underwater object detectors, accelerating AUV navigation and search capabilities.
Abstract
We develop an approach to detect objects in forward-looking sonar (FLS) images using corresponding opti- cal images and without the need for expert manual labeling of sonar images. Sonar sensing is more robust to disadvantageous underwater environmental conditions than optical sensing, but the scarcity of labeled sonar data leads to decreased perfor- mance of methods which rely on an abundance of training data. We aim to transfer insights from data-rich applications such as object detection in optical imaging to the data-scarce area of object detection in sonar images. Our approach in- volves recording of contemporaneous images from commercially available sensors viable for use aboard unmanned underwater vehicles. We collect new optical and sonar data in a shallow, clear-water environment and employ existing object detection techniques for optical images. We leverage the commonality of the sensors’ fields of view and our algorithmic processing of the sonar image to transfer knowledge of object bounding boxes to sonar images to create a dataset. Through this transfer, we enable training of a model that detects objects in unseen sonar images and does not require optical images as input at test time.