HRI-DGDM: Dual-Graph Guided Diffusion Model for Uncertain Human Motion Modeling in HRI
Hongquan Gui, Ming Li
AI summary
Problem
Deterministic models fail to capture the inherent uncertainty and multi-modal nature of human motion in human-robot interaction, while existing diffusion models prioritize diversity over accuracy and lack mechanisms to model complex human-robot spatial-temporal dependencies.
Approach
The authors propose HRI-DGDM, which integrates a structural graph for kinematic priors and a dynamically learned collaboration graph into a spatial-temporal denoising network, guided by a masking mechanism that anchors observed history during diffusion.
Key results
- Proposes HRI-DGDM, a dual-graph guided diffusion framework for HRI motion prediction
- Designs a spatial-temporal denoising network with multi-scale adaptive graph fusion
- Introduces a masking-based conditioning mechanism to anchor observed history and prevent drift
- Demonstrates superior prediction accuracy over deterministic and diffusion baselines in HRI scenarios
Why it matters
Provides a robust, uncertainty-aware prediction framework that enhances safety and proactive adaptation in human-centered robotic applications.
Abstract
Human motion in human-robot interaction (HRI) is inherently uncertain, even when performing the same task repeatedly. This variability poses a significant challenge for prediction, as models must capture a distribution of plausible futures rather than a single deterministic trajectory. Traditional graph convolutional network based models, while effective at capturing spatial temporal dependencies, are fundamentally limited by their deterministic nature and struggle to represent this inherent motion uncertainty. To address this, diffusion models have emerged as a powerful framework for modeling uncertainty. However, their direct application to HRI is hindered by two key limitations: they often prioritize motion diversity over prediction accuracy, potentially generating physically implausible results, and they fail to adequately model the complex, multi-scale spatial temporal coupling between human and robot motions. To overcome these challenges, we propose HRI-DGDM, a HRI motion prediction framework based on a dual-graph guided diffusion model. Our method introduces a dual-graph structure—comprising a structural graph for kinematic priors and a collaboration graph learned from motion dynamics—to guide the denoising process with strong structural priors. A dedicated spatial temporal denoising network (STDN) fuses multi-scale features from both graphs through adaptive fusion and hierarchical spatial temporal modeling. Furthermore, a masking-based conditioning mechanism anchors the observed history during denoising, ensuring temporal consistency and preventing drift. Experiments on HRI scenarios demonstrate that HRI-DGDM outperforms baselines in prediction accuracy.