Research Analyzer
← Back ICRA 2026

VisuaLLMPlanner - a Maneuver Planner for Automated Vehicles Using Large Language Models

Daniel Neurath, Bernd Schäufele, Ilja Radusch

PDF

AI summary

Key figure (auto-extracted from paper)
Integrating a multimodal LLM to select from pre-validated maneuver options significantly improves automated vehicle navigation in rare, long-tail traffic scenarios compared to conventional planners.
Automated driving Large language models Long-tail scenarios Motion planning Multimodal reasoning Maneuver selection

Problem

Conventional motion planners fail to handle rare, unpredictable long-tail driving scenarios due to a lack of high-level contextual reasoning. Existing LLM-based approaches often lack spatial precision or cannot isolate the model's actual decision-making contribution.

Approach

The system triggers a multimodal LLM only when a standard planner encounters an unresolved obstacle, feeding it a bird’s-eye view image and structured scene description. The model then selects from a discrete set of pre-computed, validated trajectory options rather than generating plans from scratch.

Key results

  • Outperforms prior LLM-based planners and fixed heuristics on the interPlan long-tail benchmark
  • Achieves high success rates in safety-critical categories like Jaywalker and Construction scenarios
  • Demonstrates that querying foundation models to choose from validated options yields more robust and explainable decisions
  • Successfully isolates and quantifies the LLM's contribution by restricting base planner autonomy during decision phases

Why it matters

Offers a practical, interpretable blueprint for safely integrating foundation models into automated driving stacks while clarifying their real-world reasoning limits.

Abstract

Achieving safe and reliable automated driving in real-world conditions requires the ability to handle rare and unpredictable situations, commonly known as long-tail scenarios. These cases are often underrepresented in training data and remain a major challenge for conventional motion planning systems. In this work, we present VisuaLLMPlanner, a maneuver planning framework that integrates a multimodal large language model (MLLM) into the high-level decision- making loop of an automated driving pipeline. The system is triggered when the ego vehicle encounters a situation with an obstacle that cannot be resolved by a standard lane-following planner. At this point, a structured input comprising a bird’s- eye view image and a textual scene description is generated and passed to the MLLM. Rather than generating plans directly, the model selects from a discrete set of pre-generated and validated maneuver options, allowing for interpretable and structured decision-making. We evaluate our approach on the interPlan benchmark, which focuses explicitly on long-tail sce- narios, and demonstrate that VisuaLLMPlanner achieves strong performance in comparison to prior LLM-based planners. The results highlight both the potential and current limitations of foundation models for high-level reasoning in automated vehicle planning.

Index terms

Autonomous Vehicle Navigation Task and Motion Planning Collision Avoidance

Related papers