Research Analyzer
← Back ICRA 2026

HAND Me the Data: Fast Robot Adaptation Via Hand Path Retrieval

Matthew M Hong, Anthony Liang, Kevin Kim, Harshitha Belagavi Rajaprakash, Jesse Thomason, Erdem Bıyık, Jesse Zhang

PDF

AI summary

Key figure (auto-extracted from paper)
Robots can rapidly learn new manipulation tasks using only a single human hand demonstration and unstructured play data, outperforming existing retrieval baselines by over 2×.
Robot adaptation Imitation learning Data retrieval Human demonstration Policy fine-tuning Manipulation

Problem

Scaling robot imitation learning is bottlenecked by the need for extensive, task-specific teleoperation data, leaving unstructured robot play data difficult to leverage for rapid adaptation.

Approach

HAND extracts 2D relative motion paths from a human hand demonstration to retrieve matching behaviors from task-agnostic robot play data, then rapidly fine-tunes a pre-trained policy on the retrieved trajectories.

Key results

  • Retrieves task-relevant robot behaviors using only 2D hand motion paths without calibrated cameras
  • Achieves over 2× higher average task success rates than retrieval baselines across 10 real-world tasks
  • Enables real-time policy adaptation in under 4 minutes from a single hand demonstration
  • Demonstrates robustness to unseen scenes, camera angle shifts, and background clutter

Why it matters

It provides a scalable, low-barrier method for non-experts to rapidly teach robots new manipulation tasks using minimal supervision and existing unstructured data.

Abstract

We present HAND, a simple and time-efficient method for teaching robots new manipulation tasks through human hand demonstrations. Instead of relying on task-specific robot demonstrations collected via teleoperation, HAND uses easy-to-provide hand demonstrations to retrieve relevant behav- iors from task-agnostic robot play data. Using a visual tracking pipeline, HAND extracts the motion of the human hand from the hand demonstration and retrieves robot sub-trajectories in two stages: first filtering by visual similarity, then retrieving trajectories with similar behaviors to the hand. Fine-tuning a policy on the retrieved data enables real-time learning of tasks in under four minutes, without requiring calibrated cameras or detailed hand pose estimation. Experiments also show that HAND outperforms retrieval baselines by over 2× in average task success rates on real robots. Videos can be found at our project website: https://liralab.usc.edu/handretrieval/.

Index terms

Learning from Demonstration Imitation Learning Transfer Learning

Related papers