MUSE: Multimodal Uncertainty Quantification of State Estimation
Minkyung Kim, Henry Che, Bhargav Chandaka, Bhumsitt Pramuanpornsatid, Chengyu Yang, Sheng Cheng, Xiaofeng Wang, NAIRA HOVAKIMYAN, Shenlong Wang
AI summary
Problem
Existing visual-inertial odometry systems struggle to quantify and calibrate their estimation uncertainty reliably, often yielding overconfident or poorly calibrated measures due to ignored temporal dynamics and multimodal sensor cues.
Approach
MUSE processes asynchronous visual, inertial, and odometry streams through a Mamba state-space model to capture long-horizon temporal correlations and predict a non-zero-mean Gaussian distribution for simultaneous pose correction and uncertainty calibration.
Key results
- Outperforms baselines in pose correction accuracy and uncertainty calibration
- Effectively models heteroscedastic and multimodal uncertainty across diverse VIO frameworks
- Operates as a real-time, deployable plugin for existing odometry systems
- Introduces the open-source UnCal-Flight dataset for robust VIO evaluation
Why it matters
Provides roboticists and autonomous systems with a reliable, real-time uncertainty measure essential for safe navigation in degraded or GPS-denied environments.
Abstract
Accurate visual state estimation has been a central topic in robotics with a wide range of applications in robot navigation, autonomous driving, and autonomous flight. Recent advances in robot perception have led to significant improve- ments in the accuracy and robustness of state estimation, yet a fundamental challenge remains in how to quantify and calibrate its precision, i.e., how confident we are in an estimate and whether failures can be detected. This issue is particu- larly pronounced in visual–inertial odometry (VIO), where the heteroscedastic and multimodal nature of the problem makes uncertainty quantification especially difficult. This paper intro- duces MUSE (Multimodal Uncertainty Quantification of State Estimation), a novel real-time learning-based framework that leverages the strong and efficient sequential modeling capacity of Mamba to estimate localization uncertainty from multiple asynchronous sensor streams. Experiments on both public and in-house datasets demonstrate that MUSE achieves superior reliability and robustness compared to existing uncertainty quantification methods, and ablation studies justify the benefits of its key design choices. We release our source code and dataset at https://github.com/hungdche/MUSE.