When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks
Steffen Hagedorn, Luka Donkov, Aron Distelzweig, Alexandru Paul Condurache
AI summary
Problem
Standard closed-loop benchmarks rely on passive, rule-based traffic agents that ignore complex interactions, creating a sim-to-real gap that biases planner rankings and hides real-world deficiencies.
Approach
The authors integrate the learned, reactive SMART traffic agent model into the nuPlan framework to replace passive rule-based agents, enabling realistic closed-loop evaluation of 14 planners across multiple benchmarks.
Key results
- IDM-based simulation systematically overestimates planner performance and underestimates interaction capabilities
- Learned planners degrade abruptly in edge cases while rule-based planners degrade smoothly
- Closed-loop trained planners demonstrate the most stable driving performance under realistic conditions
- Release of SMART as a drop-in alternative establishes a new realistic benchmark standard
Why it matters
It exposes critical flaws in current autonomous driving benchmarks and provides researchers with a more realistic evaluation standard to ensure planner safety and generalization before real-world deployment.
Abstract
Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive be- havior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of- the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deterio- rate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed- loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule- based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https: //github.com/shgd95/InteractiveClosedLoop.