Research Analyzer
← Back ICRA 2026

DensePercept-NCSSD: Vision Mamba towards Real-Time Dense Visual Perception with Non-Causal State Space Duality

Tushar Anand, Advik Sinha, ABHIJIT DAS

PDF

AI summary

Key figure (auto-extracted from paper)
DensePercept-NCSSD achieves state-of-the-art accuracy and real-time performance for unified optical flow and disparity estimation by replacing quadratic attention with a non-causal state space duality Mamba block.
Dense perception Optical flow Disparity estimation Vision Mamba State space duality Real-time inference

Problem

Deep learning models for dense visual perception often struggle to balance high accuracy with the low latency and computational constraints required for real-time applications.

Approach

The authors introduce a lightweight non-causal Mamba block that leverages state space duality to replace quadratic attention with linear-complexity operations, enabling efficient joint feature learning and multi-scale pyramid-based matching for unified dense perception.

Key results

  • State-of-the-art EPE of 0.54 on KITTI15 for optical flow
  • 42.93 FPS inference speed with 196.20 MB memory usage
  • Unified SOMER score of 15.06 balancing accuracy, speed, and overhead
  • Robust performance across low, mid, and large motion ranges on KITTI and Sintel

Why it matters

Provides a highly efficient, unified architecture for real-time robotic and vision systems that require accurate dense perception under strict computational constraints.

Abstract

In this work, we propose an accurate and real- time optical flow and disparity estimation model by fusing pairwise input images in the proposed non-causal selective state space for dense perception tasks. We propose a non-causal Mamba block-based model that is fast and efficient and aptly manages the constraints present in a real-time applications. Our proposed model reduces inference times while maintaining high accuracy and low GPU usage for optical flow and disparity map generation. The results and analysis, and validation in real- life scenario justify that our proposed model can be used for unified real-time and accurate 3D dense perception estimation tasks. The code, along with the models, can be found at https://github.com/vimstereo/DensePerceptNCSSD

Index terms

Deep Learning for Visual Perception Visual Learning

Related papers