LLM-Guided Task and Affordance-Level Exploration in Reinforcement Learning
Jelle Douwe Luijkx, RUNYU MA, Zlatan Ajanovic, Jens Kober
AI summary
Problem
Robotic reinforcement learning struggles with low sample efficiency and sparse rewards, requiring exhaustive exploration of large state-action spaces. Existing LLM-guided methods often produce semantically plausible but physically infeasible plans and rely on costly human demonstrations or assume perfect LLM outputs.
Approach
The framework uses an LLM to generate hierarchical task and affordance-level plans, then steers RL exploration toward these goals using a residual policy and critic-guided selection to correct suboptimality and explore multimodal affordances without human supervision.
Key results
- Hierarchical LLM-driven planning for task and affordance levels
- Goal-conditioned residual RL with intrinsic rewards guided by LLM affordances
- Critic- and uncertainty-guided affordance exploration balancing exploration and exploitation
- High sample efficiency and improved success rates over baselines in simulation, with promising zero-shot sim-to-real transfer
Why it matters
It enables more sample-efficient and robust robotic manipulation learning by safely leveraging LLM reasoning without requiring human demonstrations or assuming perfect LLM outputs.
Abstract
Reinforcement learning (RL) is a promising ap- proach for robotic manipulation, but it can suffer from low sample efficiency and requires extensive exploration of large state-action spaces. Recent methods leverage the commonsense knowledge and reasoning abilities of large language models (LLMs) to guide exploration toward more meaningful states. However, LLMs can produce plans that are semantically plausi- ble yet physically infeasible, yielding unreliable behavior. We in- troduce LLM-TALE, a framework that uses LLMs’ planning to directly steer RL exploration. LLM-TALE integrates planning at both the task level and the affordance level, improving learn- ing efficiency by directing agents toward semantically meaning- ful actions. Unlike prior approaches that assume optimal LLM- generated plans or rewards, LLM-TALE corrects suboptimality online and explores multimodal affordance-level plans without human supervision. We evaluate LLM-TALE on pick-and-place tasks in standard RL benchmarks, observing improvements in both sample efficiency and success rates over strong baselines. Real-robot experiments indicate promising zero-shot sim-to-real transfer. Code and supplementary material are available at https://llm-tale.github.io.