← Back ICRA 2026

Physics-Informed Diffusion Mamba Transformer for Real-World Driving

Hang Zhou, Qiang Zhang, Peiran Liu, Yihao Qin, Zhaoxu Yan, Yiding Ji

PDF

AI summary

Key figure (auto-extracted from paper)

Pi-DiMT outperforms state-of-the-art planners by combining Mamba-based sequence modeling with Port-Hamiltonian physics constraints to generate accurate, dynamically feasible driving trajectories.

autonomous driving diffusion models Mamba transformer Port-Hamiltonian networks trajectory planning physics-informed learning

Problem

Current diffusion-based trajectory planners fail to effectively aggregate long-range sequential contexts and lack explicit integration of vehicle kinematics and dynamic constraints, often producing physically infeasible motions.

Approach

The authors propose Pi-DiMT, which embeds Mamba state-space modules and self-attention into a diffusion transformer for efficient context aggregation, and integrates a Port-Hamiltonian neural network to enforce energy-based physical constraints during generation.

Key results

Introduces a Diffusion Mamba Transformer architecture for efficient long-range dependency modeling
Integrates a Port-Hamiltonian neural network to enforce energy-based physical constraints
Achieves state-of-the-art predictive accuracy and physical plausibility on the nuPlan benchmark
Reduces inference overhead while maintaining dynamically feasible, real-time trajectory generation

Why it matters

Provides a robust, physics-grounded planning framework that enhances safety and reliability for real-world autonomous driving systems.

Abstract

Autonomous driving systems demand trajectory planners that not only model the inherent uncertainty of future motions but also respect complex temporal dependencies and underlying physical laws. While diffusion-based generative models excel at capturing multi-modal distributions, they often fail to incorporate long-term sequential contexts and domain- specific physical priors. In this work, we bridge these gaps with two key innovations. First, we introduce a Diffusion Mamba Transformer architecture that embeds mamba and attention into the diffusion process, enabling more effective aggrega- tion of sequential input contexts from sensor streams and past motion histories. Second, we design a Port-Hamiltonian Neural Network module that seamlessly integrates energy- based physical constraints into the diffusion model, thereby enhancing trajectory predictions with both consistency and interpretability. Extensive evaluations on standard autonomous driving benchmarks demonstrate that our unified framework significantly outperforms state-of-the-art baselines in predictive accuracy, physical plausibility, and robustness, thereby advanc- ing safe and reliable motion planning.

Index terms

Imitation Learning Autonomous Vehicle Navigation Learning from Demonstration