Research Analyzer
← Back ICRA 2026

Reward Evolution with Graph-Of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning

Changwei Yao, Xinzi Liu, Chen Li, Marios Savvides

PDF

AI summary

Key figure (auto-extracted from paper)
RE-GoT autonomously evolves reward functions using graph-based reasoning and visual feedback, significantly boosting success rates on complex robotic manipulation tasks without human intervention.
Reinforcement Learning Reward Design Large Language Models Graph-of-Thoughts Visual Language Models Robotics

Problem

Designing effective reward functions for reinforcement learning typically requires extensive human expertise and iterative tuning, while existing LLM-based approaches struggle with hallucinations, reliance on human feedback, and handling complex multi-step tasks.

Approach

The authors propose RE-GoT, a bi-level framework that uses LLMs to decompose tasks into structured text-attributed graphs for reward generation, and VLMs to evaluate rollout videos and provide automated visual feedback for iterative reward refinement.

Key results

  • Improves average task success rates by 32.25% on RoboGen benchmarks.
  • Achieves 93.73% average success rate across four ManiSkill2 manipulation tasks.
  • Surpasses prior LLM-based baselines and even exceeds expert-designed rewards.
  • Eliminates the need for human feedback through automated VLM-based rollout evaluation.

Why it matters

It provides a scalable, automated solution for reward engineering in robotics, reducing reliance on human expertise and enabling effective training for complex, long-horizon manipulation tasks.

Abstract

Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for au- tomated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE- GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Lan- guage Models (VLMs) for automated rollout evaluation. RE- GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE- GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.

Index terms

AI-Based Methods Reinforcement Learning AI-Enabled Robotics

Related papers