DemoDiffusion: One-Shot Human Imitation Using Pre-Trained Diffusion Policy
Sungjae Park, Homanga Bharadhwaj, Shubham Tulsiani
AI summary
Problem
Generalist robot policies struggle with zero-shot deployment in novel environments, while existing one-shot imitation methods rely on brittle kinematic retargeting or require costly online reinforcement learning and paired human-robot data.
Approach
The method extracts 3D hand poses from a human video, converts them to an open-loop robot trajectory via kinematic retargeting, and then uses a pre-trained diffusion policy to iteratively denoise and refine this trajectory into feasible, closed-loop robot actions.
Key results
- 83.8% average success rate across 8 real-world manipulation tasks
- Surpasses base diffusion policy (13.8%) and kinematic retargeting (52.5%) in real-world tests
- Successfully executes tasks where the pre-trained generalist policy fails entirely
- Robust performance in simulation dexterous grasping across varying object sizes
Why it matters
It provides a practical, low-effort deployment pathway for generalist robot policies in unstructured environments, making one-shot human imitation accessible to non-expert users without requiring task-specific data collection or online training.
Abstract
We propose DemoDiffusion, a simple method for enabling robots to perform manipulation tasks by imitating a single human demonstration, without requiring task-specific training or paired human-robot data. Our approach is based on two insights. First, the hand motion in a human demon- stration provides a useful prior for the robot’s end-effector trajectory, which we can convert into a rough open-loop robot motion trajectory via kinematic retargeting. Second, while this retargeted motion captures the overall structure of the task, it may not align well with plausible robot actions in-context. To address this, we leverage a pre-trained generalist diffusion policy to modify the trajectory, ensuring it both follows the human motion and remains within the distribution of plausible robot actions. Unlike approaches based on online reinforcement learning or paired human-robot data, our method enables robust adaptation to new tasks and scenes with minimal effort. In real-world experiments across 8 diverse manipulation tasks, DemoDiffusion achieves 83.8% average success rate, compared to 13.8% for the pre-trained policy and 52.5% for kinematic retargeting, succeeding even on tasks where the pre-trained generalist policy fails entirely. Project page: https://demodiffusion.github.io/