← Back ICRA 2026

Proactive Risk-Aware Trajectory Planning for Autonomous Driving in Unstructured Environments Via Reinforcement Learning with Adaptive Reward Design

Jiawei Du, Weiming Qu, Shenghai Yuan, Jia Wang, qifei bai, chengguang li, Xihong Wu, Dingsheng Luo

PDF

AI summary

Key figure (auto-extracted from paper)

Integrating proactive risk prediction with dynamically adaptive reward tuning significantly boosts safety, efficiency, and generalization in unstructured autonomous driving.

Proactive planning Reinforcement learning Adaptive reward Unstructured traffic Risk prediction Autonomous driving

Problem

Current trajectory planners are largely reactive and fail to anticipate future risks, while reinforcement learning approaches depend on rigid, manually crafted reward functions that struggle to generalize across complex traffic contexts.

Approach

The method predicts future traffic trajectories and hidden pedestrian risk zones to proactively reserve safety margins, while a large-model agent dynamically adjusts reward weights during training to adapt to evolving traffic conditions.

Key results

Proactive avoidance of high-risk zones via future trajectory and ghost-probe prediction
Dynamic reward weight adjustment guided by a large-model agent
High-fidelity simulation environment built on Peking University campus
Superior safety, efficiency, and generalization over state-of-the-art baselines

Why it matters

Provides a scalable, context-aware planning framework that enhances autonomous vehicle safety and adaptability in complex, unstructured real-world environments.

Abstract

Trajectory planning for autonomous driving in dynamic unstructured traffic remains a fundamental challenge. Existing methods are often reactive, i.e., they only respond to observed situations without explicitly anticipating future risks. Moreover, most reinforcement learning based approaches rely on manually crafted reward functions, which limits their adapt- ability and generalization across complex driving scenarios. In this paper, we propose a novel RL-based trajectory planning framework that integrates proactive obstacle avoidance and adaptive reward learning. Specifically, our planner predicts the future trajectories of surrounding traffic participants as well as potential ghost-probe risk zones, and proactively avoids these high-risk regions during planning. In addition, we introduce a large-model agent that dynamically adjusts the reward signals according to evolving traffic contexts, enabling more adaptive and robust policy learning compared with fixed reward designs. To evaluate our method, we build a high-fidelity simulation environment based on the Peking University campus, which provides realistic unstructured traffic scenarios. Exten- sive experiments demonstrate that our method significantly improves safety, efficiency, and generalization over state-of- the-art baselines, particularly in scenarios with occlusions and unpredictable behaviors.

Index terms

Autonomous Vehicle Navigation Intelligent and Flexible Manufacturing Planning Scheduling and Coordination