← Back ICRA 2026

MUSE: Multimodal Uncertainty Quantification of State Estimation

Minkyung Kim, Henry Che, Bhargav Chandaka, Bhumsitt Pramuanpornsatid, Chengyu Yang, Sheng Cheng, Xiaofeng Wang, NAIRA HOVAKIMYAN, Shenlong Wang

PDF

AI summary

Key figure (auto-extracted from paper)

MUSE leverages a Mamba-based state-space model to jointly correct poses and predict well-calibrated, heteroscedastic uncertainty from multimodal sensor streams in real time.

Uncertainty Quantification Visual-Inertial Odometry Mamba State-Space Models Pose Correction Robotics

Problem

Existing visual-inertial odometry systems struggle to quantify and calibrate their estimation uncertainty reliably, often yielding overconfident or poorly calibrated measures due to ignored temporal dynamics and multimodal sensor cues.

Approach

MUSE processes asynchronous visual, inertial, and odometry streams through a Mamba state-space model to capture long-horizon temporal correlations and predict a non-zero-mean Gaussian distribution for simultaneous pose correction and uncertainty calibration.

Key results

Outperforms baselines in pose correction accuracy and uncertainty calibration
Effectively models heteroscedastic and multimodal uncertainty across diverse VIO frameworks
Operates as a real-time, deployable plugin for existing odometry systems
Introduces the open-source UnCal-Flight dataset for robust VIO evaluation

Why it matters

Provides roboticists and autonomous systems with a reliable, real-time uncertainty measure essential for safe navigation in degraded or GPS-denied environments.

Abstract

Accurate visual state estimation has been a central topic in robotics with a wide range of applications in robot navigation, autonomous driving, and autonomous flight. Recent advances in robot perception have led to significant improve- ments in the accuracy and robustness of state estimation, yet a fundamental challenge remains in how to quantify and calibrate its precision, i.e., how confident we are in an estimate and whether failures can be detected. This issue is particu- larly pronounced in visual–inertial odometry (VIO), where the heteroscedastic and multimodal nature of the problem makes uncertainty quantification especially difficult. This paper intro- duces MUSE (Multimodal Uncertainty Quantification of State Estimation), a novel real-time learning-based framework that leverages the strong and efficient sequential modeling capacity of Mamba to estimate localization uncertainty from multiple asynchronous sensor streams. Experiments on both public and in-house datasets demonstrate that MUSE achieves superior reliability and robustness compared to existing uncertainty quantification methods, and ablation studies justify the benefits of its key design choices. We release our source code and dataset at https://github.com/hungdche/MUSE.

Index terms

Deep Learning for Visual Perception SLAM Visual-Inertial SLAM