TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
Momchil Tomov, Sang Uk Lee, Hansford Hendargo, Jinwook Huh, Teawon Han, Forbes Howington, Rafael Rodrigues da Silva, Gianmarco Bernasconi, Marc Heim, Samuel Findler, Xiaonan Ji, Alexander Boule, Michael Napoli, Kuo Chen, Jesse Miller, Boaz Cornelis Floor, Yunqing Hu
AI summary
Problem
Classical motion planners often produce unnatural or uncomfortable behavior, while machine learning-based planners struggle to guarantee safety and generalize to rare, critical scenarios.
Approach
The method repurposes Monte Carlo tree search to efficiently generate a diverse set of safe candidate trajectories, then uses a deep scoring function trained via inverse reinforcement learning to select the most human-like option for execution.
Key results
- First real-world demonstration of MCTS-based planning in dense urban traffic
- Outperforms classical and ML-based planners in safety, comfort, and progress across simulation and 500+ miles of on-road testing
- Zero safety driver takeovers due to ACC or cut-in failures across 268 autonomous miles
- Demonstrates that hybrid classical/learning architectures effectively bridge the sim-to-real gap for discretionary driving metrics
Why it matters
Provides a scalable, safe, and human-like planning framework for autonomous vehicles, highlighting the need for real-world evaluation across diverse metrics to advance self-driving technology.
Abstract
We present TreeIRL, a novel planner for au- tonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation and in real-world driving. The key idea is to use MCTS to find a promising set of safe candidate trajectories and a deep scoring function trained with IRL to select the most human-like among them. We evaluate TreeIRL against classical and state-of-the-art planners on large-scale simulations and on 500+ miles of real-world au- tonomous driving in the Las Vegas metropolitan area. Scenarios include navigating heavy urban traffic, adaptive cruise control, cut-ins, and traffic lights. TreeIRL achieves the best overall performance, striking a balance between safety, progress, com- fort, and human-likeness. To the best of our knowledge, our work is the first public-road demonstration of MCTS-based planning and underscores the importance of evaluating planners across a diverse set of metrics and in real-world environments. TreeIRL is highly extensible and could be further improved with reinforcement learning and imitation learning, providing a framework for exploring different combinations of classical and learning-based approaches to solve the planning bottleneck in autonomous driving.