Research Analyzer
← Back ICRA 2026

Motion before Action: Diffusing Object Motion As Manipulation Condition

Yue Su, Xinyu Zhan, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang

PDF

AI summary

Key figure (auto-extracted from paper)
Inferring future object motion before generating actions significantly improves the success rate and robustness of diffusion-based robotic manipulation policies.
Imitation Learning Diffusion Models Robotic Manipulation Object Motion Policy Augmentation Dexterous Manipulation

Problem

Existing robot manipulation policies often over-rely on direct observation-to-action mapping, leading to poor generalization and a lack of human-like reasoning about object dynamics and motion.

Approach

MBA uses two cascaded diffusion processes to first predict future object pose sequences from observations, then conditions the robot action generator on these predicted poses.

Key results

  • Plug-and-play module that integrates with existing diffusion-based policies
  • 14.2% average success rate increase over Diffusion Policy and 6.2% over DP3 across 57 simulation tasks
  • Consistent performance gains across articulated, soft, rigid, and tool-use manipulation tasks
  • Enhanced policy robustness and kinematic consistency by decoupling motion reasoning from action generation

Why it matters

Offers a lightweight, architecture-agnostic upgrade that makes existing robotic manipulation policies more reliable and generalizable across diverse real-world scenarios.

Abstract

Inferring object motion representations from ob- servations enhances the performance of robotic manipulation tasks. This paper introduces a new paradigm for robot im- itation learning that generates action sequences by reasoning about object motion from visual observations. We propose MBA (Motion Before Action), a novel policy module that employs two cascaded diffusion processes for object motion generation and robot action generation under object motion guidance. MBA first predicts the future pose sequence of the object based on observations, and then uses this sequence as a condition to guide robot action generation. Designed as a plug-and-play component, MBA can be flexibly integrated into existing robotic manipulation policies with diffusion action heads. Extensive experiments in both simulated and real-world environments demonstrate that our approach substantially improves the performance of existing policies across a wide range of manipulation tasks. Project page: https://selen-suyue.github.io/MBApage/

Index terms

Imitation Learning Deep Learning in Grasping and Manipulation Learning from Demonstration

Related papers