← Back ICRA 2026

When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks

Steffen Hagedorn, Luka Donkov, Aron Distelzweig, Alexandru Paul Condurache

PDF

AI summary

Key figure (auto-extracted from paper)

Evaluating planners with realistic, learned traffic agents reveals that standard benchmarks systematically overestimate performance and underestimate interaction capabilities, exposing abrupt failure modes in learned planners under stress.

Closed-loop simulation Traffic agent modeling nuPlan benchmark Autonomous driving evaluation Learned traffic models Planner robustness

Problem

Standard closed-loop benchmarks rely on passive, rule-based traffic agents that ignore complex interactions, creating a sim-to-real gap that biases planner rankings and hides real-world deficiencies.

Approach

The authors integrate the learned, reactive SMART traffic agent model into the nuPlan framework to replace passive rule-based agents, enabling realistic closed-loop evaluation of 14 planners across multiple benchmarks.

Key results

IDM-based simulation systematically overestimates planner performance and underestimates interaction capabilities
Learned planners degrade abruptly in edge cases while rule-based planners degrade smoothly
Closed-loop trained planners demonstrate the most stable driving performance under realistic conditions
Release of SMART as a drop-in alternative establishes a new realistic benchmark standard

Why it matters

It exposes critical flaws in current autonomous driving benchmarks and provides researchers with a more realistic evaluation standard to ensure planner safety and generalization before real-world deployment.

Abstract

Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive be- havior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of- the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deterio- rate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed- loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule- based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https: //github.com/shgd95/InteractiveClosedLoop.

Index terms

Performance Evaluation and Benchmarking Intelligent Transportation Systems Software Tools for Benchmarking and Reproducibility