FastViDAR: Real-Time Omnidirectional Depth Estimation Via Alternative Hierarchical Attention
Hangtian ZHAO, Xiang Chen, Yizhe Li, Qianhao Wang, Haibo Lu, Fei Gao
AI summary
Problem
Real-time omnidirectional depth estimation remains challenging for resource-constrained platforms due to the computational cost of transformer-based multi-view models and the calibration dependencies of classic fisheye stereo methods.
Approach
The system projects fisheye inputs to a unified equirectangular grid and processes them with an Alternative Hierarchical Attention (AHA) module that alternates local windowed attention with global summary attention to efficiently fuse cross-view features without explicit 3D cost volumes.
Key results
- Novel Alternative Hierarchical Attention (AHA) mechanism for efficient cross-view fusion
- Equirectangular projection (ERP) fusion for seamless 360° depth estimation
- Competitive zero-shot accuracy on HM3D and 2D-3D-S benchmarks
- Real-time inference up to 20 FPS on NVIDIA Orin NX embedded hardware
Why it matters
Provides a practical, low-cost alternative to LiDAR for real-time spatial perception in robotics and autonomous driving.
Abstract
In this paper, we propose FastViDAR, a novel framework that takes four fisheye camera inputs and pro- duces a full 360◦depth map along with per-camera depth, fusion depth, and confidence estimates. Our main contributions are: (1) We introduce an Alternative Hierarchical Attention (AHA) mechanism that efficiently fuses features across views through separate intra-frame and inter-frame windowed self- attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel equirectangular projection (ERP) fusion approach that projects multi-view depth esti- mates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image-depth pairs using HM3D and 2D-3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware. Project page: https://zhao- hangtian.github.io/FastViDAR/