← Back ICRA 2026

Scaling Single Human Demonstrations for Imitation Learning Using Generative Foundational Models

Nick Heppert, Minh Quang Nguyen, Abhinav Valada

PDF

AI summary

Key figure (auto-extracted from paper)

Real2Gen scales a single human demonstration into abundant simulation data using 3D generative models, boosting policy success rates by 26.6% and enabling zero-shot real-world transfer.

Imitation learning Human demonstration 3D generation Simulation-to-real Flow matching Robotic manipulation

Problem

Collecting robot demonstrations is tedious and requires skilled operators, while learning directly from human demonstrations is difficult due to embodiment mismatches and limited data scalability.

Approach

The method extracts object trajectories and reference images from one human video, generates diverse 3D meshes with foundational models, aligns them to the scene, and automates unlimited robot demonstration collection in simulation to train a flow-matching policy.

Key results

26.6% average success rate increase over the DITTO baseline
Zero-shot real-world deployment of a purely simulation-trained policy
Automated pipeline for generating diverse 3D assets and robot demonstrations from a single human video
Public release of code, trained models, and datasets

Why it matters

Democratizes scalable robot skill acquisition by eliminating tedious data collection and bridging the human-robot embodiment gap through generative simulation.

Abstract

Imitation learning is a popular paradigm to teach robots new tasks, but collecting robot demonstrations through teleoperation or kinesthetic teaching is tedious and time- consuming. In contrast, directly demonstrating a task using our human embodiment is much easier and data is available in abundance, yet transfer to the robot can be non-trivial. In this work, we propose Real2Gen to train a manipulation policy from a single human demonstration. Real2Gen extracts required information from the demonstration and transfers it to a simulation environment, where a programmable expert agent can demonstrate the task arbitrarily many times, generating an unlimited amount of data to train a flow matching policy. We evaluate Real2Gen on human demonstrations from three different real-world tasks and compare it to a recent baseline. Real2Gen shows an average increase in the success rate of 26.6% and better generalization of the trained policy due to the abundance and diversity of training data. We further deploy our purely simulation-trained policy zero-shot in the real world. We make the data, code, and trained models publicly available at https://real2gen.cs.uni-freiburg.de.

Index terms

Learning from Demonstration Deep Learning in Grasping and Manipulation Simulation and Animation