← Back ICRA 2026

Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

Minyoung Hwang, Alexandra Forsey-Smerek, Nathaniel Dennler, Andreea Bobu

PDF

AI summary

Key figure (auto-extracted from paper)

Masked IRL leverages LLM-generated state masks and demonstration-context reasoning to learn robust reward functions from significantly less data and ambiguous language.

Inverse Reinforcement Learning Language-Conditioned Reward LLM Reasoning State Relevance Masks Robot Personalization Sample Efficiency

Problem

Reward learning from limited demonstrations often overfits to spurious correlations, while existing language-conditioned methods treat instructions as static signals that fail to resolve ambiguity or identify relevant environmental features.

Approach

The method uses LLMs to infer state-relevance masks from language and clarifies ambiguous instructions by jointly reasoning with demonstrations, enforcing invariance to irrelevant states through a novel masking loss.

Key results

Up to 15% performance gain over prior language-conditioned IRL
Up to 4.7× reduction in required demonstration data
Effective disambiguation of underspecified language via demonstration context
Robust generalization across simulation and real-robot experiments

Why it matters

Enables robots to reliably adapt to diverse user preferences with minimal feedback, advancing practical personalized robot learning.

Abstract

Robots can adapt to user preferences by learning reward functions from demonstrations, but with limited data, reward models often overfit to spurious correlations and fail to generalize. This happens because demonstrations show robots how to do a task but not what matters for that task, causing the model to focus on irrelevant state details. Natural language can more directly specify what the robot should focus on, and, in principle, dis- ambiguate between many reward functions consistent with the demonstrations. However, existing language-conditioned reward learning methods typically treat instructions as simple conditioning signals, without fully exploiting their potential to resolve ambiguity. Moreover, real instructions are often ambiguous themselves, so naive conditioning is unreliable. Our key insight is that these two input types carry complementary information: demonstrations show how to act, while language specifies what is important. We propose Masked Inverse Reinforcement Learning (Masked IRL), a framework that uses large language models (LLMs) to combine the strengths of both input types. Masked IRL infers state-relevance masks from language instructions and enforces invariance to irrelevant state components. When instructions are ambiguous, it uses LLM reasoning to clarify them in the context of the demonstrations. In simulation and on a real robot, Masked IRL outperforms prior language- conditioned IRL methods by up to 15% while using up to 4.7 times less data, demonstrating improved sample-efficiency, generalization, and robustness to ambiguous language. Project page and Code: https://github.com/MIT-CLEAR- Lab/Masked-IRL

Index terms

Learning from Demonstration Imitation Learning Human-Centered Robotics