← Back ICRA 2026

Hybrid Diffusion Policies with Projective Geometric Algebra for Efficient Robot Manipulation Learning

Xiatao Sun, Yuxuan Wang, Shuo Yang, Yinxing Chen, Daniel Rakita

PDF

AI summary

Key figure (auto-extracted from paper)

Hybridizing geometric algebra encoders with standard denoisers drastically accelerates training convergence and boosts task performance in robot manipulation learning.

Diffusion policies Projective Geometric Algebra Robot manipulation Geometric deep learning Training efficiency

Problem

Standard diffusion policies inefficiently relearn fundamental spatial priors from scratch for each new task, while purely geometric algebra-based denoisers suffer from prohibitively slow convergence.

Approach

The authors introduce hPGA-DP, a hybrid architecture that uses a geometric algebra transformer to encode spatial states and decode actions, while relying on standard U-Net or Transformer networks for the core denoising process.

Key results

Faster convergence than standard U-Net or Transformer diffusion policies
Superior task performance across five Robosuite manipulation benchmarks
Successful convergence where pure P-GATr denoisers fail entirely
Robust performance across varying decoder loss masking thresholds

Why it matters

This hybrid architecture offers a practical path to efficient, geometrically aware robot learning, benefiting robotics researchers and practitioners developing scalable visuomotor policies.

Abstract

Diffusion policies are a powerful paradigm for robot learning, but their training is often inefficient. A key reason is that networks must relearn fundamental spatial con- cepts, such as translations and rotations, from scratch for every new task. To alleviate this redundancy, we propose embedding geometric inductive biases directly into the network architecture using Projective Geometric Algebra (PGA). PGA provides a unified algebraic framework for representing geometric primitives and transformations, allowing neural networks to reason about spatial structure more effectively. In this paper, we introduce hPGA-DP, a novel hybrid diffusion policy that capitalizes on these benefits. Our architecture leverages the Projective Geometric Algebra Transformer (P-GATr) as a state encoder and action decoder, while employing established U-Net or Transformer-based modules for the core denoising process. Through extensive experiments and ablation studies in both simulated and real-world environments, we demonstrate that hPGA-DP significantly improves task performance and training efficiency. Notably, our hybrid approach achieves substantially faster convergence compared to both standard diffusion policies and architectures that rely solely on P-GATr. The project web- site is available at: https://apollo-lab-yale.github.io/26-ICRA- hPGA-website/.

Index terms

Imitation Learning Deep Learning in Grasping and Manipulation Learning from Demonstration