← Back ICRA 2026

360DVO: Deep Visual Odometry for Monocular 360-Degree Camera

Xiaopeng Guo, Yinzhe XU, Huajian Huang, Sai-Kit Yeung

PDF

AI summary

Key figure (auto-extracted from paper)

360DVO establishes a new state-of-the-art for monocular 360-degree visual odometry by leveraging deep learning to overcome projection distortion and environmental challenges, boosting accuracy by 37.5% and robustness by 50%.

visual odometry omnidirectional vision deep learning bundle adjustment 360-degree camera spherical CNN

Problem

Existing omnidirectional visual odometry systems struggle with projection distortion and lack robustness in challenging real-world scenarios like aggressive motion and varying illumination.

Approach

The authors introduce 360DVO, a deep learning framework that uses a distortion-aware spherical feature extractor to handle equirectangular distortion and an omnidirectional differentiable bundle adjustment module for efficient joint pose and depth optimization.

Key results

First deep learning-based framework for monocular 360-degree visual odometry
SphereResNet network for distortion-resistant spherical feature extraction
Omnidirectional differentiable bundle adjustment (ODBA) for joint pose and depth optimization
New real-world benchmark dataset with 20 challenging sequences across diverse environments

Why it matters

Enables reliable ego-motion estimation for autonomous navigation and AR/VR systems using affordable 360-degree cameras in complex real-world conditions.

Abstract

Monocular omnidirectional visual odometry (OVO) systems leverage 360-degree cameras to overcome field-of-view limitations of perspective VO systems. However, existing methods, reliant on handcrafted features or photometric objectives, often lack robustness in challenging scenarios, such as aggressive motion and varying illumination. To address this, we present 360DVO, the first deep learning-based OVO framework. Our approach introduces a distortion-aware spherical feature ex- tractor (DAS-Feat) that adaptively learns distortion-resistant features from 360-degree images. These sparse feature patches are then used to establish constraints for effective pose estimation within a novel omnidirectional differentiable bundle adjustment (ODBA) module. To facilitate evaluation in realistic settings, we also contribute a new real-world OVO benchmark. Extensive experiments on this benchmark and public synthetic datasets (TartanAir V2 and 360VO) demonstrate that 360DVO surpasses state-of-the-art baselines (including 360VO and OpenVSLAM), improving robustness by 50% and accuracy by 37.5%. Home- page: https://360dvo.hkustvgd.com

Index terms

SLAM Omnidirectional Vision Data Sets for SLAM