← Back ICRA 2024

EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization

Zhendong Xiao, Changhao Chen, Yang Shan, Wu Wei

PDF

Abstract

Camera relocalization is pivotal in computer vi- sion, with applications in AR, drones, robotics, and autonomous driving. It estimates 3D camera position and orientation (6-DoF) from images. Unlike traditional methods like SLAM, recent strides use deep learning for direct end-to-end pose estima- tion. We propose EffLoc, a novel efficient Vision Transformer for single-image camera relocalization. EffLoc’s hierarchical layout, memory-bound self-attention, and feed-forward layers boost memory efficiency and inter-channel communication. Our introduced sequential group attention (SGA) module enhances computational efficiency by diversifying input features, reduc- ing redundancy, and expanding model capacity. EffLoc excels in efficiency and accuracy, outperforming prior methods, such as AtLoc and MapNet. It thrives on large-scale outdoor car- driving scenario, ensuring simplicity, end-to-end trainability, and eliminating handcrafted loss functions.

Index terms

Localization SLAM Deep Learning Methods