Scaling Single Human Demonstrations for Imitation Learning Using Generative Foundational Models
Nick Heppert, Minh Quang Nguyen, Abhinav Valada
AI summary
Problem
Collecting robot demonstrations is tedious and requires skilled operators, while learning directly from human demonstrations is difficult due to embodiment mismatches and limited data scalability.
Approach
The method extracts object trajectories and reference images from one human video, generates diverse 3D meshes with foundational models, aligns them to the scene, and automates unlimited robot demonstration collection in simulation to train a flow-matching policy.
Key results
- 26.6% average success rate increase over the DITTO baseline
- Zero-shot real-world deployment of a purely simulation-trained policy
- Automated pipeline for generating diverse 3D assets and robot demonstrations from a single human video
- Public release of code, trained models, and datasets
Why it matters
Democratizes scalable robot skill acquisition by eliminating tedious data collection and bridging the human-robot embodiment gap through generative simulation.
Abstract
Imitation learning is a popular paradigm to teach robots new tasks, but collecting robot demonstrations through teleoperation or kinesthetic teaching is tedious and time- consuming. In contrast, directly demonstrating a task using our human embodiment is much easier and data is available in abundance, yet transfer to the robot can be non-trivial. In this work, we propose Real2Gen to train a manipulation policy from a single human demonstration. Real2Gen extracts required information from the demonstration and transfers it to a simulation environment, where a programmable expert agent can demonstrate the task arbitrarily many times, generating an unlimited amount of data to train a flow matching policy. We evaluate Real2Gen on human demonstrations from three different real-world tasks and compare it to a recent baseline. Real2Gen shows an average increase in the success rate of 26.6% and better generalization of the trained policy due to the abundance and diversity of training data. We further deploy our purely simulation-trained policy zero-shot in the real world. We make the data, code, and trained models publicly available at https://real2gen.cs.uni-freiburg.de.