← Back ICRA 2026

Metric, Inertially Aligned Monocular State Estimation Via Kinetodynamic Priors

Jiaxin Liu, Min Li, Wanting Xu, Liang Li, Jiaqi Yang, Laurent Kneip

PDF

AI summary

Key figure (auto-extracted from paper)

A monocular camera on a flexible platform can recover metric scale and gravity alignment without extra sensors by leveraging learned deformation physics and continuous-time motion models.

Monocular state estimation Non-rigid robotics Kinetodynamic priors Metric scale recovery Neural deformation modeling Passive inertial sensing

Problem

Traditional state estimation relies on rigid-body assumptions that fail for flexible robotic systems, while monocular vision inherently lacks metric scale and inertial alignment, typically requiring additional sensors to resolve.

Approach

The method combines a learned neural deformation-force model with continuous-time B-spline kinematics to align visually derived camera acceleration with predicted physical deformation, jointly optimizing for metric scale, gravity alignment, and platform trajectory.

Key results

Neural deformation-force model trained via motion capture
B-spline based joint optimization for metric scale and gravity recovery
Passive inertial sensing capability from purely visual input
Robust performance under noise and real-world spring-camera experiments

Why it matters

This approach enables accurate, metric state estimation for flexible robots and UAVs using only a single camera, reducing hardware complexity while advancing autonomous navigation and soft robotics.

Abstract

Accurate state estimation for flexible robotic sys- tems poses significant challenges, particularly for platforms with dynamically deforming structures that invalidate rigid- body assumptions. This paper addresses this problem and enables the extension of existing rigid-body pose estimation methods to non-rigid systems. Our approach integrates two core components: first, we capture elastic properties using a deformation-force model, efficiently learned via a Multi-Layer Perceptron; second, we resolve the platform’s inherently smooth motion using continuous-time B-spline kinematic models. By continuously applying Newton’s Second Law, our method formulates the relationship between visually-derived trajectory acceleration and predicted deformation-induced acceleration. We demonstrate that our approach not only enables robust and accurate pose estimation on non-rigid platforms, but also shows that the properly modeled platform physics allow for the recovery of inertial sensing properties. We validate this feasibility on a simple spring-camera system, showing how it robustly resolves the typically ill-posed problem of metric scale and gravity recovery in monocular visual odometry.

Index terms

SLAM Flexible Robotics Kinematics