QO-Net: Query Optimization Underwater Object Detection Network
Jiandong Tian, Hongyang Sun, baojie fan, Hongxin Xu
Abstract
Underwater object detection has attracted increas- ing interest for its wide application in various underwater tasks. However, due to underwater image quality degradation and the lack of large-scale underwater object datasets, many underwater detectors suffer from low detection performance. To address the issues, we not only propose a novel underwater transformer detector with multi-scale feature enhancement and query optimization, named QO-Net, but also construct a new underwater object detection dataset, called UODD. Specifically, a Conv-Trans Layer is developed as the unit of QO-Net, which effectively learns multi-scale image feature representation through CNN and simultaneously captures the dependencies among different positions in the sequence data through Trans- former, enabling QO-Net to process underwater image sequence information over longer distances. An effective combination can enhance the representation of multi-scale features. Then, QO-Net develops a positional query enhancement strategy to optimize the spatial prior of positional queries, thereby speeding up the convergence of the network training. In addition, UODD also contains more than 20,000 underwater images for training and validation, with a variety of rich underwater categories. Extensive experiments on UODD, Brackish, and TrashCan datasets demonstrate that QO-Net presents favorable detection performance against state-of-the-art methods in terms of robustness and accuracy.