← Back ICRA 2026

MAGNIFIED: RL Fine-Tuning of Multimodal Large Language Models for Motion Planning

Letian Chen, Yiren Lu, Justin Fu, Yichen Xie, Runsheng Xu, Jyh-Jing Hwang, Benjamin Sapp, Dragomir Anguelov

PDF

AI summary

Key figure (auto-extracted from paper)

Reinforcement learning fine-tuning with token-level rewards transforms multimodal LLMs from simple imitators into cost-aware autonomous driving planners.

Reinforcement Learning Fine-Tuning Multimodal LLMs Autonomous Driving Motion Planning Token-level Rewards Waymo Open Motion Dataset

Problem

Standard supervised fine-tuning of multimodal LLMs optimizes for next-token prediction, which fails to align with critical autonomous driving planning objectives like safety and trajectory feasibility.

Approach

MAGNIFIED maps predicted text tokens to vehicle trajectories and applies reinforcement learning with token-level rewards to directly optimize planning costs like overlap and off-road violations.

Key results

10.5% reduction in trajectory overlap rate compared to supervised fine-tuning
38.9% reduction in off-road driving rate
Improved long-horizon imitative accuracy without explicit optimization
Achieves planning gains using only 1.25% of supervised fine-tuning compute

Why it matters

Provides a sample-efficient pathway to deploy multimodal LLMs as safe, cost-aware planners for autonomous vehicles, bridging semantic reasoning and real-world trajectory optimization.

Abstract

Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in semantic under- standing and common sense reasoning, making them promising candidates for solving planning problems in autonomous driv- ing. However, the next-token text prediction objectives tradi- tionally used in pre-training and supervised fine-tuning (SFT) of MLLMs may fall short of fulfilling the planning objectives for autonomous vehicles. The next-token prediction objective merely encourages per-token imitation in text, often irrespective of multi-step consequences and the alignment with crucial planning considerations such as giving space to other road actors. To overcome these limitations, we propose a reinforce- ment learning fine-tuning (RLFT) approach, MAGNIFIED, that aligns the MLLM-based driving agent with planning objectives by learning from token-level rewards. By mapping a sequence of predicted tokens to corresponding vehicle trajectories and learning from planning rewards, MAGNIFIED optimizes for the true planning objectives rather than focusing solely on token prediction accuracy, enabling the model to refine its understanding of the planning task beyond simple imitation. We validate our approach on the Waymo Open Motion Dataset with a novel setup incorporating rasterized birds-eye views and tokenized trajectories as inputs and planning-oriented outputs. An initial SFT phase establishes a strong baseline in outputting plan trajectories as sequences of X-Y coordinates in text, while subsequent RL fine-tuning substantially enhances planning performance relative to the SFT baseline (demonstrating over a 10.5% reduction in overlap rate and a 38.9% reduction in off- road rate), underscoring the potential of RLFT on MLLMs to achieve vehicle planning that is better aligned with compliant, comfortable, and efficient driving.

Index terms

Motion and Path Planning Reinforcement Learning Autonomous Vehicle Navigation