← Back ICRA 2026

MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation

Huanqian Wang, chi bene chen, yang yue, Danhua Tao, Tong Guo, Shaoxuan Xie, Denghang Huang, Shiji Song, Guocai Yao, Gao Huang

PDF

AI summary

Key figure (auto-extracted from paper)

Injecting continuous motion into objects and cameras during data collection dramatically improves spatial generalization and data efficiency for robotic manipulation policies.

Spatial generalization robotic manipulation data collection motion augmentation diffusion policy imitation learning

Problem

Static data collection captures only fixed spatial configurations per trajectory, causing severe spatial sparsity and poor generalization when robots face new object poses, target locations, or camera viewpoints.

Approach

MOVE intentionally moves pickup objects, target objects, and the camera during expert demonstrations, embedding dense spatial variability directly into each trajectory to enrich training data.

Key results

39.1% average success rate in Meta-World simulations, a 76.1% improvement over static collection
2–5× gains in data efficiency across multiple simulation tasks
Matches static method real-world performance using less than half the training data
Robust generalization to unseen grasp points and constrained circular sampling paths

Why it matters

Provides a simple, scalable solution to the spatial sparsity bottleneck, enabling real-world robotic systems to learn manipulation skills more efficiently without complex architectural changes.

Abstract

Imitation learning method has shown immense promise for robotic manipulation, yet its practical deployment is fundamentally constrained by the data scarcity. Despite prior work on collecting large-scale datasets, there still remains a significant gap to robust spatial generalization. We identify a key limitation: individual trajectories, regardless of their length, are typically collected from a single, static spatial configuration of the environment. This includes fixed object and target spatial positions as well as unchanging camera viewpoints, which significantly restricts the diversity of spatial information available for learning. To address this critical bottleneck in data efficiency, we propose MOtion-Based Variability Enhancement (MOVE), a simple yet effective data collection paradigm that en- ables the acquisition of richer spatial information from dynamic demonstrations. Our core contribution is an augmentation strategy that injects motion into any movable objects within the environment for each demonstration. This process implicitly generates a dense and diverse set of spatial configurations within a single trajectory. We conduct extensive experiments in both simulation and real-world environments to validate our approach. For example, in simulation tasks requiring strong spatial generalization, MOVE achieves an average success rate of 39.1%, a 76.1% relative improvement over the static data collection paradigm (22.2%), and yields up to 2–5× gains in data efficiency on certain tasks. Our code is available at https://github.com/lucywang720/MOVE5.

Index terms

AI-Based Methods Data Sets for Robot Learning Imitation Learning