← Back ICRA 2026

Controllable Motion Generation Via Diffusion Modal Coupling

Luobin Wang, Hongzhan Yu, Chenning Yu, Sicun Gao, Henrik Iskov Christensen

PDF

AI summary

Key figure (auto-extracted from paper)

Replacing the standard Gaussian prior with a mode-coupled multi-modal prior enables precise, guidance-free control of diffusion models while preserving sample fidelity.

Diffusion models controllable generation multi-modal prior motion prediction robotics modal coupling

Problem

Post-hoc guidance methods used to control diffusion models for robotics introduce train-test mismatch and off-manifold drift, degrading sample fidelity. Achieving fine-grained controllability without sacrificing physical realism remains an open challenge.

Approach

The method replaces the standard unimodal prior with a Gaussian-mixture prior and derives modified forward/reverse diffusion processes that tightly couple each prior component to a principal data mode, enabling direct mode selection at sampling without external guidance.

Key results

Eliminates train-test mismatch and off-manifold drift
Achieves higher fidelity and controllability on Waymo and Maze2D benchmarks
Outperforms guidance baselines in accuracy and physical plausibility
Enables single-model multi-task control without per-task training

Why it matters

Provides a scalable, guidance-free framework for precise and realistic motion generation, benefiting robotics researchers and autonomous driving developers.

Abstract

Diffusion models are increasingly used in robotics to represent multi-modal distributions over system states and behaviors, but precise control of generated outcomes without degrading physical realism remains challenging. This paper introduces a controllable diffusion framework that (i) re- places the standard unimodal Gaussian prior with an explicit multi-modal prior, and (ii) enforces modal coupling between prior components and principal data modes through novel forward and reverse diffusion processes. Sampling is initial- ized directly from a selected prior mode aligned with task constraints, avoiding train–test mismatch and manifold drift commonly induced by post-hoc guidance. Empirical evaluations on motion prediction (Waymo Dataset) and multi-task control (Maze2D) show consistent improvements over guidance-based baselines in fidelity, diversity, and controllability. These re- sults indicate that multi-modal priors with strong modal cou- pling provide a scalable basis for controllable motion gener- ation in robotics. The official implementation is provided in https://github.com/RobinWangSD/Diffusion-Modal-Coupling/.

Index terms

Motion and Path Planning AI-Based Methods