Progress Constraints for Reinforcement Learning in Behavior Trees
Finn Rietz, Mart Karta�ev, Petter Ogren, Johannes A. Stork
AI summary
Problem
Naïvely combining RL with Behavior Trees causes controllers to greedily optimize rewards, leading to oscillations, unsafe actions, and the reversal of previously achieved subgoals.
Approach
The method derives invariant convergence sets from BT theory and trains feasibility estimators to dynamically mask RL action spaces, forcing controllers to respect task progression.
Key results
- Extends BT convergence theory to general tree structures
- Learns feasibility estimators to dynamically mask RL action sets
- Improves sample efficiency and constraint satisfaction in RL training
- Open-sources a high-fidelity warehouse simulation environment
Why it matters
Enables safer, more sample-efficient RL training for complex robotic tasks by automatically enforcing structural task dependencies without manual reward shaping.
Abstract
Behavior Trees (BTs) provide a structured and re- active framework for decision-making, commonly used to switch between sub-controllers based on environmental conditions. Reinforcement Learning (RL), on the other hand, can learn near-optimal controllers but sometimes struggles with sparse rewards, safe exploration, and long-horizon credit assignment. Combining BTs with RL has the potential for mutual benefit: a BT design encodes structured domain knowledge that can simplify RL training, while RL enables automatic learning of the controllers within BTs. However, naïve integration of BTs and RL can lead to some controllers counteracting other controllers, possibly undoing previously achieved subgoals, thereby degrading the overall performance. To address this, we propose progress constraints, a novel mechanism where feasibility estimators constrain the allowed action set based on theoretical BT convergence results. Empirical evaluations in a 2D proof-of-concept and a high-fidelity warehouse environment demonstrate improved performance, sample efficiency, and constraint satisfaction, compared to prior methods of BT-RL integration.