← Back ICRA 2026

Autoregressive End-To-End Planning with Time-Invariant Spatial Alignment and Multi-Objective Policy Refinement

Jianbo Zhao, Taiyu Ban, Xiangjie Li, Xingtai Gui, Hangning Zhou, Lei Liu, Zhao Hongwei, Bin Li

PDF

AI summary

Key figure (auto-extracted from paper)

By correcting spatio-temporal misalignment in latent space and applying multi-objective preference optimization, autoregressive planners achieve state-of-the-art autonomous driving performance.

end-to-end planning autoregressive models spatio-temporal alignment direct preference optimization autonomous driving kinematic control

Problem

Autoregressive end-to-end planners condition future actions on stale past sensor snapshots, creating a spatio-temporal misalignment that breaks the agent's consistent worldview and limits planning performance.

Approach

The method introduces a Time-Invariant Spatial Alignment (TISA) module to dynamically realign environmental features into a consistent future ego-frame in latent space, combined with kinematic action prediction and multi-objective Direct Preference Optimization for fine-grained policy refinement.

Key results

Achieves state-of-the-art 89.8 PDMS on the NAVSIM dataset
Resolves spatio-temporal misalignment via latent-space view transformation
Ensures physical feasibility through discretized kinematic action prediction
Multi-objective DPO outperforms standard single-objective preference optimization

Why it matters

Provides a scalable, physically consistent framework for autoregressive autonomous driving that bridges the gap between imitation learning and safety-aware policy refinement.

Abstract

The inherent sequential modeling capabilities of autoregressive models make them a formidable baseline for end-to-end planning in autonomous driving. Nevertheless, their performance is constrained by a spatio-temporal misalignment, as the planner must condition future actions on past sensory data. This creates an inconsistent worldview, limiting the upper bound of performance for an otherwise powerful approach. To address this, we propose a Time-Invariant Spatial Alignment (TISA) module that learns to project initial environmental features into a consistent ego-centric frame for each future time step, effectively correcting the agent’s worldview without explicit future scene prediction. In addition, we employ a kinematic action prediction head (i.e., acceleration and yaw rate) to ensure physically feasible trajectories. Finally, we introduce a multi-objective post-training stage using Direct Preference Optimization (DPO) to move beyond pure imitation. Our approach provides targeted feedback on specific driving behaviors, offering a more fine-grained learning signal than the single, overall objective used in standard DPO. Our model achieves a state-of-the-art 89.8 PDMS on the NAVSIM dataset among autoregressive models.

Index terms

Intelligent Transportation Systems Deep Learning Methods