Research Analyzer
← Back ICRA 2026

Multimodal Fusion-Guided Diffusion Policy for Motion Planning in Rugged and Obstacle-Dense Environments

Haoyu Xi, Wei Li, Yu Hu

PDF

AI summary

Key figure (auto-extracted from paper)
M-DP fuses visual and LiDAR features to guide a fast diffusion policy, enabling safe, dynamically feasible motion planning in complex, obstacle-dense terrains.
Motion planning Diffusion policy Multimodal fusion LiDAR Visual navigation Dynamic constraints

Problem

Motion planning in unstructured, rugged environments is hindered by fragmented free space, high computational costs of traditional mapping, and the lack of dynamic constraint awareness in existing learning-based methods.

Approach

The framework integrates camera and LiDAR data at the feature level to guide a diffusion policy that generates candidate trajectories, then selects the optimal path using a scoring module that evaluates semantic, geometric, and robot dynamic constraints.

Key results

  • Introduced a multimodal early-fusion mechanism for joint semantic and geometric environment understanding.
  • Developed a dynamics-aware trajectory determination module to ensure execution safety and reduce tracking failures.
  • Deployed the framework on a mobile robot and released a self-collected dataset for rugged environments.
  • Demonstrated lower collision rates and higher success rates than baseline methods in real-world experiments.

Why it matters

Enables reliable, real-time autonomous navigation for mobile robots in safety-critical, unstructured terrains like search-and-rescue or planetary exploration.

Abstract

Motion planning in unstructured environments remains a challenging task, particularly in scenarios with both dense obstacles and rugged terrain, under the requirements for safety and real-time performance. To address these challenges, this paper proposes a multimodal fusion-guided diffusion policy framework, abbreviated as M-DP, which synergistically guided by visual, LiDAR data, and goal targets. An early-fusion mechanism is designed to combine images and LiDAR points, enabling simultaneous semantic and geometric understanding of the environment to guide the diffusion policy in generating obstacle-aware candidate trajectories. Within the diffusion policy module, the Denoising Diffusion Implicit Model (DDIM) is employed to improve real-time performance. Furthermore, a trajectory determination module is proposed that incorporates not only environmental semantics and terrain geometry but also robot dynamic constraints. This ensures that the selected tra- jectory is dynamically feasible, significantly reducing the risk of tracking failures and enhancing safety in challenging conditions. The selected path effectively balances safety, goal-reaching ability, and bumpiness. Real-world experimental evaluations demonstrate the safety and effectiveness of the framework com- pared to baseline methods, with ablation studies validating the contributions of key components. Codes and our self-collected dataset are available on https://github.com/xhy1599/M-DP.

Index terms

Motion and Path Planning Constrained Motion Planning Field Robots

Related papers