← Back ICRA 2026

CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance

Rui Heng Yang, Xuan Zhao, Léo Maxime Brunswic, Montgomery Tucker Alban, Matéo Clémente, Tongtong Cao, Jun Jin, Amir Rasouli

PDF

AI summary

Key figure (auto-extracted from paper)

CAPE enables diffusion-based robot policies to avoid collisions in unseen, cluttered environments without large-scale training data by iteratively refining trajectories using context-aware priors.

Diffusion Policy Collision Avoidance Motion Planning Imitation Learning Inference-Time Guidance Robotic Manipulation

Problem

Diffusion models for robotics struggle to generalize to novel, cluttered environments because training on diverse obstacle configurations is costly and impractical, while standard inference-time guidance often fails or distorts trajectories without sufficient prior mode coverage.

Approach

CAPE expands trajectory modes at inference time by perturbing the unexecuted portion of a planned trajectory into an intermediate noise level to form a context-aware prior, then iteratively refines it with weak, collision-aware guidance to safely steer sampling without distorting the learned distribution.

Key results

Up to 80% higher success rate in reaching and pick-and-place tasks
4× improvement in replanning frequency compared to state-of-the-art
Robust collision avoidance in unseen simulated and real-world cluttered environments
Eliminates need for large-scale obstacle-inclusive training datasets

Why it matters

Enables safe, generalizable robotic manipulation in dynamic, cluttered spaces using only obstacle-free training data, reducing data collection costs while maintaining real-time replanning capabilities.

Abstract

In robotics, diffusion models can capture multi- modal trajectories from demonstrations, making them a trans- formative approach in imitation learning. However, achieving optimal performance following this regiment requires a large- scale dataset, which is costly to obtain, especially for challenging tasks, such as collision avoidance. In such tasks, generalization at test time demands coverage of many obstacle types and their spatial configurations, which are impractical to acquire purely via data. Recent works ease this burden with training-free guid- ance by injecting environmental context at inference, however, it only works when paired with a sufficiently diverse training dataset that yields a conditional trajectory distribution with rich multimodal coverage. To remedy this problem, we propose Context-Aware diffusion policy via Proximal mode Expansion (CAPE), a framework that expands trajectory distribution modes with context-aware prior and guidance at inference via a novel prior-seeded iterative guided refinement procedure for mo- tion replanning. The framework generates an initial trajectory plan and executes a short prefix trajectory, and then the remain- ing trajectory segment is perturbed to an intermediate noise level, forming a context-aware trajectory prior that preserves goal consistency and previously expanded modes. Repeating the process with context-aware guided denoising iteratively expands mode support to allow finding smoother, less collision- prone trajectories. We evaluate CAPE on reaching and pick- and-place tasks in cluttered unseen simulated and real-world settings and show that our proposed approach achieves up to 80% higher success rate and 4× improvement in replanning frequency compared to state-of-the-art, demonstrating better generalization to unseen environments.

Index terms

Imitation Learning Integrated Planning and Learning Motion and Path Planning