Motion before Action: Diffusing Object Motion As Manipulation Condition
Yue Su, Xinyu Zhan, Hongjie Fang, Yong-Lu Li, Cewu Lu, Lixin Yang
AI summary
Problem
Existing robot manipulation policies often over-rely on direct observation-to-action mapping, leading to poor generalization and a lack of human-like reasoning about object dynamics and motion.
Approach
MBA uses two cascaded diffusion processes to first predict future object pose sequences from observations, then conditions the robot action generator on these predicted poses.
Key results
- Plug-and-play module that integrates with existing diffusion-based policies
- 14.2% average success rate increase over Diffusion Policy and 6.2% over DP3 across 57 simulation tasks
- Consistent performance gains across articulated, soft, rigid, and tool-use manipulation tasks
- Enhanced policy robustness and kinematic consistency by decoupling motion reasoning from action generation
Why it matters
Offers a lightweight, architecture-agnostic upgrade that makes existing robotic manipulation policies more reliable and generalizable across diverse real-world scenarios.
Abstract
Inferring object motion representations from ob- servations enhances the performance of robotic manipulation tasks. This paper introduces a new paradigm for robot im- itation learning that generates action sequences by reasoning about object motion from visual observations. We propose MBA (Motion Before Action), a novel policy module that employs two cascaded diffusion processes for object motion generation and robot action generation under object motion guidance. MBA first predicts the future pose sequence of the object based on observations, and then uses this sequence as a condition to guide robot action generation. Designed as a plug-and-play component, MBA can be flexibly integrated into existing robotic manipulation policies with diffusion action heads. Extensive experiments in both simulated and real-world environments demonstrate that our approach substantially improves the performance of existing policies across a wide range of manipulation tasks. Project page: https://selen-suyue.github.io/MBApage/