DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-Time Optical Flow and Stereo Estimation
Tushar Anand, Maheswar Bora, Antitza Dantcheva, ABHIJIT DAS
AI summary
Problem
Deep learning methods for optical flow and stereo estimation face a critical trade-off between accuracy and computational efficiency, while existing Mamba models lack explicit mechanisms for dense cross-image correspondence.
Approach
The authors introduce DenVisCoM, a hybrid architecture that fuses symmetric convolution branches with a joint Mamba sequence block and self/cross-attention to process left and right image patches simultaneously, enabling efficient long-range dependency modeling and precise dense matching.
Key results
- Lowest EPE (1.34) and F1-all (2.52) on KITTI15 optical flow benchmark
- Real-time inference speed (~39.9 FPS) with memory comparable to leading methods
- Competitive Sintel Final unmatched error (10.67), outperforming Unimatch and FlowFormer
- Novel hybrid Mamba-Transformer block enabling simultaneous joint learning of image pairs without quadratic complexity
Why it matters
Enables accurate, real-time dense perception for resource-constrained applications like autonomous driving and robotics by overcoming the accuracy-efficiency bottleneck of current vision models.
Abstract
In this work, we propose a novel Mamba block DenVisCoM, as well as a novel hybrid architecture specifically tailored for accurate and real-time estimation of optical flow and disparity estimation. Given that such multi-view geometry and motion tasks are fundamentally related, we propose a unified architecture to tackle them jointly. Specifically, the proposed hybrid architecture is based on DenVisCoM and a Transformer-based attention block that efficiently addresses real-time inference, memory footprint, and accuracy for at the same time for joint estimation of motion and 3D dense perception tasks. We extensively analyze the benchmark trade- off of accuracy and real-time processing on a large number of datasets. Our experimental results and related analysis suggest that our proposed model can accurately estimate optical flow and disparity estimation in real time. All models and associated code are available at https://github.com/vimstereo/DenVisCoM.