Research Analyzer
← Back ICRA 2026

GIFT: Generalizing Intent for Flexible Test-Time Rewards

Fin Amin, Nathaniel Dennler, Andreea Bobu

PDF

AI summary

Key figure (auto-extracted from paper)
GIFT leverages language models to infer high-level human intent from demonstrations, enabling reward functions to generalize to novel objects and layouts at test time without retraining.
Reward generalization Test-time adaptation Intent inference Language models Robot learning Distribution shift

Problem

Robots learning rewards from demonstrations often fail to generalize to new environments because they latch onto spurious visual or semantic correlations rather than the underlying human intent. Existing surface-level similarity methods diverge from what actually matters for task success.

Approach

The framework uses language models to infer high-level intent by contrasting preferred and non-preferred demonstrations, then maps novel test states to behaviorally equivalent training states via intent-conditioned similarity to reuse the original reward function.

Key results

  • Infers high-level task intent from contrasting demonstration pairs using language models
  • Maps novel test states to intent-equivalent training states without retraining
  • Outperforms visual and semantic baselines in simulated pairwise win rates and state-alignment F1 scores across 50+ unseen objects
  • Successfully transfers to real-world 7-DoF Franka Panda robot tasks

Why it matters

Enables robots to reliably generalize learned rewards to novel environments and objects, reducing continuous user retraining and improving real-world deployment robustness.

Abstract

Robots learn reward functions from user demon- strations, but these rewards often fail to generalize to new environments. This failure occurs because learned rewards latch onto spurious correlations in training data rather than the underlying human intent that demonstrations represent. Existing methods leverage visual or semantic similarity to im- prove robustness, yet these surface-level cues often diverge from what humans actually care about. We present Generalizing Intent for Flexible Test-Time rewards (GIFT), a framework that grounds reward generalization in human intent rather than surface cues. GIFT leverages language models to infer high- level intent from user demonstrations by contrasting preferred with non-preferred behaviors. At deployment, GIFT maps novel test states to behaviorally equivalent training states via intent- conditioned similarity, enabling learned rewards to generalize across distribution shifts without retraining. We evaluate GIFT on tabletop manipulation tasks with new objects and layouts. Across four simulated tasks with over 50 unseen objects, GIFT consistently outperforms visual and semantic similarity baselines in test-time pairwise win rate and state-alignment F1 score. Real-world experiments on a 7-DoF Franka Panda robot demonstrate that GIFT reliably transfers to physical settings. Further discussion can be found at https://mit-clear- lab.github.io/GIFT/

Index terms

Human-Centered Automation Learning from Experience Human-Centered Robotics

Related papers