← Back ICRA 2026

Deep Sensorimotor Control by Imitating Predictive Models of Human Motion

Himanshu Gaurav Singh, Pieter Abbeel, Jitendra Malik, Antonio Loquercio

PDF

AI summary

Key figure (auto-extracted from paper)

Tracking predicted human motion keypoints enables robots to learn complex manipulation tasks from sparse rewards without manual reward engineering or kinematic retargeting.

Human-to-robot transfer Reinforcement learning Motion prediction Sensorimotor control Reward shaping Dexterous manipulation

Problem

Leveraging large-scale human interaction datasets for robot learning is hindered by the need for per-sample kinematic retargeting, accurate environment replicas, or unstable adversarial losses, making it difficult to scale.

Approach

Train a causal transformer to predict future human hand keypoints from scene observations, then use reinforcement learning to train robot policies that track these zero-shot predictions while optimizing a sparse task reward.

Key results

Eliminates gradient-based kinematic retargeting and adversarial losses
Enables zero-shot transfer of a single motion predictor across diverse robots and tasks
Substitutes dense reward engineering with a simple keypoint tracking reward
Outperforms demonstration-guided RL baselines by a large margin

Why it matters

Provides a scalable pathway to leverage massive human interaction datasets for training dexterous robot manipulation policies without manual reward design.

Abstract

As the embodiment gap between a robot and a human narrows, new opportunities arise to leverage datasets of humans interacting with their surroundings for robot learning. We propose a novel technique for training sensorimotor policies with reinforcement learning by imitating predictive models of human motions. Our key insight is that the motion of keypoints on human-inspired robot end-effectors closely mirrors the motion of corresponding human body keypoints. This enables us to use a model trained to predict future motion on human data zero-shot on robot data. We train sensorimotor policies to track the predictions of such a model, conditioned on a history of past robot states, while optimizing a relatively sparse task reward. This approach entirely bypasses gradient-based kinematic retargeting and adversarial losses, which limit existing methods from fully leveraging the scale and diversity of modern human-scene interaction datasets. Empirically, we find that our approach can work across robots and tasks, outperforming existing baselines by a large margin. In addition, we find that tracking a human motion model can substitute for carefully designed dense rewards and curricula in manipulation tasks.

Index terms

Reinforcement Learning Dexterous Manipulation Deep Learning in Grasping and Manipulation