DensePercept-NCSSD: Vision Mamba towards Real-Time Dense Visual Perception with Non-Causal State Space Duality
Tushar Anand, Advik Sinha, ABHIJIT DAS
AI summary
Problem
Deep learning models for dense visual perception often struggle to balance high accuracy with the low latency and computational constraints required for real-time applications.
Approach
The authors introduce a lightweight non-causal Mamba block that leverages state space duality to replace quadratic attention with linear-complexity operations, enabling efficient joint feature learning and multi-scale pyramid-based matching for unified dense perception.
Key results
- State-of-the-art EPE of 0.54 on KITTI15 for optical flow
- 42.93 FPS inference speed with 196.20 MB memory usage
- Unified SOMER score of 15.06 balancing accuracy, speed, and overhead
- Robust performance across low, mid, and large motion ranges on KITTI and Sintel
Why it matters
Provides a highly efficient, unified architecture for real-time robotic and vision systems that require accurate dense perception under strict computational constraints.
Abstract
In this work, we propose an accurate and real- time optical flow and disparity estimation model by fusing pairwise input images in the proposed non-causal selective state space for dense perception tasks. We propose a non-causal Mamba block-based model that is fast and efficient and aptly manages the constraints present in a real-time applications. Our proposed model reduces inference times while maintaining high accuracy and low GPU usage for optical flow and disparity map generation. The results and analysis, and validation in real- life scenario justify that our proposed model can be used for unified real-time and accurate 3D dense perception estimation tasks. The code, along with the models, can be found at https://github.com/vimstereo/DensePerceptNCSSD