← Back ICRA 2026

VisuaLLMPlanner - a Maneuver Planner for Automated Vehicles Using Large Language Models

Daniel Neurath, Bernd Schäufele, Ilja Radusch

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating a multimodal LLM to select from pre-validated maneuver options significantly improves automated vehicle navigation in rare, long-tail traffic scenarios compared to conventional planners.

Automated driving Large language models Long-tail scenarios Motion planning Multimodal reasoning Maneuver selection

Problem

Conventional motion planners fail to handle rare, unpredictable long-tail driving scenarios due to a lack of high-level contextual reasoning. Existing LLM-based approaches often lack spatial precision or cannot isolate the model's actual decision-making contribution.

Approach

The system triggers a multimodal LLM only when a standard planner encounters an unresolved obstacle, feeding it a bird’s-eye view image and structured scene description. The model then selects from a discrete set of pre-computed, validated trajectory options rather than generating plans from scratch.

Key results

Outperforms prior LLM-based planners and fixed heuristics on the interPlan long-tail benchmark
Achieves high success rates in safety-critical categories like Jaywalker and Construction scenarios
Demonstrates that querying foundation models to choose from validated options yields more robust and explainable decisions
Successfully isolates and quantifies the LLM's contribution by restricting base planner autonomy during decision phases

Why it matters

Offers a practical, interpretable blueprint for safely integrating foundation models into automated driving stacks while clarifying their real-world reasoning limits.

Abstract

Achieving safe and reliable automated driving in real-world conditions requires the ability to handle rare and unpredictable situations, commonly known as long-tail scenarios. These cases are often underrepresented in training data and remain a major challenge for conventional motion planning systems. In this work, we present VisuaLLMPlanner, a maneuver planning framework that integrates a multimodal large language model (MLLM) into the high-level decision- making loop of an automated driving pipeline. The system is triggered when the ego vehicle encounters a situation with an obstacle that cannot be resolved by a standard lane-following planner. At this point, a structured input comprising a bird’s- eye view image and a textual scene description is generated and passed to the MLLM. Rather than generating plans directly, the model selects from a discrete set of pre-generated and validated maneuver options, allowing for interpretable and structured decision-making. We evaluate our approach on the interPlan benchmark, which focuses explicitly on long-tail sce- narios, and demonstrate that VisuaLLMPlanner achieves strong performance in comparison to prior LLM-based planners. The results highlight both the potential and current limitations of foundation models for high-level reasoning in automated vehicle planning.

Index terms

Autonomous Vehicle Navigation Task and Motion Planning Collision Avoidance