Lite-SVO: Towards a Lightweight Self-Supervised Semantic Visual Odometry Exploiting Multi-Feature Sharing Architecture
Wenhui Wei, Jiantao Li, Kaizhu Huang, Jiadong Li, Xin Liu, Yangfan Zhou
Abstract
Not relying on ground-truth data for training, self- supervised semantic visual odometry (SVO) has recently gained considerable attention. Within self-supervised SVO, feature representation inconsistency between semantic/depth and pose tasks presents a significant challenge, as it may disrupt cross- task feature representations and lead to notable performance degradation. Regrettably, existing self-supervised SVO lacks an effective solution to address this obstacle, for either overlooking this issue or exploiting a too heavy architecture. In response to this challenge, we propose a groundbreaking solution within the Single-Stream architecture, known as Lite-SVO, which is a lightweight yet efficient multi-feature sharing architecture. Lite- SVO is designed to bolster self-supervised SVO, facilitating its adoption on edge devices without compromising accuracy and performance. The crucial innovation lies in the multi-feature sharing architecture, which fuses the semantic and depth maps as pose features, thus significantly reducing the model com- plexity and boosting the speed in edge devices. Built upon the novel feature sharing framework, Lite-SVO further optimizes the feature sharing representation to improve the performance. Specifically, a cross-feature sharing module alleviates the impact of object boundary in depth estimation, while a multi-feature sharing module focuses on extracting and fusing spatial features to enhance pose estimation. Experimental results demonstrate that our method is at least 84.46% faster than the state-of- the-art Single-Stream approaches, and excitingly, our method’s pose accuracy is about 79.83% higher than theirs.