Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning
Xiatao Sun, Yuxuan Wang, Shuo Yang, Yinxing Chen, Daniel Rakita
AI summary
Problem
Standard diffusion policies inefficiently relearn fundamental spatial priors from scratch for each new task, while purely geometric algebra-based denoisers suffer from prohibitively slow convergence.
Approach
The authors introduce hPGA-DP, a hybrid architecture that uses a geometric algebra transformer to encode spatial states and decode actions, while relying on standard U-Net or Transformer networks for the core denoising process.
Key results
- Faster convergence than standard U-Net or Transformer diffusion policies
- Superior task performance across five Robosuite manipulation benchmarks
- Successful convergence where pure P-GATr denoisers fail entirely
- Robust performance across varying decoder loss masking thresholds
Why it matters
This hybrid architecture offers a practical path to efficient, geometrically aware robot learning, benefiting robotics researchers and practitioners developing scalable visuomotor policies.
Abstract
Diffusion policies are a powerful paradigm for robot learning, but their training is often inefficient. A key reason is that networks must relearn fundamental spatial con- cepts, such as translations and rotations, from scratch for every new task. To alleviate this redundancy, we propose embedding geometric inductive biases directly into the network architecture using Projective Geometric Algebra (PGA). PGA provides a unified algebraic framework for representing geometric primitives and transformations, allowing neural networks to reason about spatial structure more effectively. In this paper, we introduce hPGA-DP, a novel hybrid diffusion policy that capitalizes on these benefits. Our architecture leverages the Projective Geometric Algebra Transformer (P-GATr) as a state encoder and action decoder, while employing established U-Net or Transformer-based modules for the core denoising process. Through extensive experiments and ablation studies in both simulated and real-world environments, we demonstrate that hPGA-DP significantly improves task performance and training efficiency. Notably, our hybrid approach achieves substantially faster convergence compared to both standard diffusion policies and architectures that rely solely on P-GATr. The project web- site is available at: https://apollo-lab-yale.github.io/26-ICRA- hPGA-website/.