← Back ICRA 2026

UMI-On-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies

Harsh Gupta, Xiaofeng Guo, Huy Ha, Chuer Pan, Muqing Cao, Dongjae Lee, Sebastian Scherer, Shuran Song, Guanya Shi

PDF

AI summary

Key figure (auto-extracted from paper)

Injecting low-level controller feedback into diffusion sampling significantly boosts the feasibility and success rates of embodiment-agnostic policies on constrained aerial robots.

Embodiment-aware guidance Diffusion policy Aerial manipulation Cross-embodiment learning Visuomotor control Universal Manipulation Interface

Problem

Embodiment-agnostic policies trained on handheld demonstrations often generate dynamically infeasible trajectories for constrained robots like aerial manipulators due to control and dynamics mismatches.

Approach

The method couples a high-level diffusion policy with a low-level embodiment-specific controller at inference time, using the controller's tracking cost gradients to steer trajectory generation toward dynamically feasible modes.

Key results

Proposes Embodiment-Aware Diffusion Policy (EADP) for plug-and-play test-time guidance
Introduces a simulation benchmark to quantify the embodiment gap across varying UMI-abilities
Achieves over 9% average success rate improvement on aerial tasks without disturbances and over 20% with disturbances
Demonstrates robust real-world deployment of long-horizon and high-precision aerial manipulation skills

Why it matters

Provides a practical pathway for scaling universal manipulation skills to diverse and highly constrained robotic platforms without requiring embodiment-specific retraining.

Abstract

We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manip- ulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments—such as aerial manipulators—is the mismatch in control and robot dynamics, which often leads to out-of- distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment- specific controller at inference time. By integrating gradient feedback from the controller’s tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deploy- ment embodiment. This enables plug-and-play, embodiment- aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse—and even highly constrained—embodiments. All code, data, checkpoints, and result videos can be found at umi-on-air.github.io.

Index terms

Learning from Demonstration Aerial Systems: Applications Mobile Manipulation