Research Analyzer
← Back ICRA 2026

Progress Constraints for Reinforcement Learning in Behavior Trees

Finn Rietz, Mart Karta�ev, Petter Ogren, Johannes A. Stork

PDF

AI summary

Key figure (auto-extracted from paper)
Progress constraints via learned feasibility estimators prevent RL controllers from undoing BT subgoals, boosting sample efficiency and safety.
Behavior Trees Reinforcement Learning Progress Constraints Feasibility Estimation Action Masking Robotics

Problem

Naïvely combining RL with Behavior Trees causes controllers to greedily optimize rewards, leading to oscillations, unsafe actions, and the reversal of previously achieved subgoals.

Approach

The method derives invariant convergence sets from BT theory and trains feasibility estimators to dynamically mask RL action spaces, forcing controllers to respect task progression.

Key results

  • Extends BT convergence theory to general tree structures
  • Learns feasibility estimators to dynamically mask RL action sets
  • Improves sample efficiency and constraint satisfaction in RL training
  • Open-sources a high-fidelity warehouse simulation environment

Why it matters

Enables safer, more sample-efficient RL training for complex robotic tasks by automatically enforcing structural task dependencies without manual reward shaping.

Abstract

Behavior Trees (BTs) provide a structured and re- active framework for decision-making, commonly used to switch between sub-controllers based on environmental conditions. Reinforcement Learning (RL), on the other hand, can learn near-optimal controllers but sometimes struggles with sparse rewards, safe exploration, and long-horizon credit assignment. Combining BTs with RL has the potential for mutual benefit: a BT design encodes structured domain knowledge that can simplify RL training, while RL enables automatic learning of the controllers within BTs. However, naïve integration of BTs and RL can lead to some controllers counteracting other controllers, possibly undoing previously achieved subgoals, thereby degrading the overall performance. To address this, we propose progress constraints, a novel mechanism where feasibility estimators constrain the allowed action set based on theoretical BT convergence results. Empirical evaluations in a 2D proof-of-concept and a high-fidelity warehouse environment demonstrate improved performance, sample efficiency, and constraint satisfaction, compared to prior methods of BT-RL integration.

Index terms

Integrated Planning and Learning Reinforcement Learning

Related papers