Metric, Inertially Aligned Monocular State Estimation Via Kinetodynamic Priors
Jiaxin Liu, Min Li, Wanting Xu, Liang Li, Jiaqi Yang, Laurent Kneip
AI summary
Problem
Traditional state estimation relies on rigid-body assumptions that fail for flexible robotic systems, while monocular vision inherently lacks metric scale and inertial alignment, typically requiring additional sensors to resolve.
Approach
The method combines a learned neural deformation-force model with continuous-time B-spline kinematics to align visually derived camera acceleration with predicted physical deformation, jointly optimizing for metric scale, gravity alignment, and platform trajectory.
Key results
- Neural deformation-force model trained via motion capture
- B-spline based joint optimization for metric scale and gravity recovery
- Passive inertial sensing capability from purely visual input
- Robust performance under noise and real-world spring-camera experiments
Why it matters
This approach enables accurate, metric state estimation for flexible robots and UAVs using only a single camera, reducing hardware complexity while advancing autonomous navigation and soft robotics.
Abstract
Accurate state estimation for flexible robotic sys- tems poses significant challenges, particularly for platforms with dynamically deforming structures that invalidate rigid- body assumptions. This paper addresses this problem and enables the extension of existing rigid-body pose estimation methods to non-rigid systems. Our approach integrates two core components: first, we capture elastic properties using a deformation-force model, efficiently learned via a Multi-Layer Perceptron; second, we resolve the platform’s inherently smooth motion using continuous-time B-spline kinematic models. By continuously applying Newton’s Second Law, our method formulates the relationship between visually-derived trajectory acceleration and predicted deformation-induced acceleration. We demonstrate that our approach not only enables robust and accurate pose estimation on non-rigid platforms, but also shows that the properly modeled platform physics allow for the recovery of inertial sensing properties. We validate this feasibility on a simple spring-camera system, showing how it robustly resolves the typically ill-posed problem of metric scale and gravity recovery in monocular visual odometry.