← Back ICRA 2026

Disentangled Point Diffusion for Precise Object Placement

Lyuxing He, Eric Cai, Shobhit Aggarwal, Jianjun Wang, David Held

PDF

AI summary

Key figure (auto-extracted from paper)

TAX-DPD achieves millimeter-level placement precision and robust generalization to novel object geometries by disentangling point cloud diffusion into separate geometry and frame prediction stages.

point diffusion robotic manipulation goal prediction object placement generative modeling industrial robotics

Problem

End-to-end and existing goal-prediction policies struggle to generalize across novel object geometries while maintaining the high precision required for low-tolerance tasks like industrial insertion.

Approach

The method uses a two-stage hierarchical framework that first predicts a dense Gaussian Mixture Model for global placement initialization, then refines the goal via a disentangled point diffusion process that separately denoises object geometry and placement frame.

Key results

Novel dense GMM for multi-modal global placement initialization
Disentangled point diffusion module separating geometry and frame prediction
Millimeter-level precision and high success rates on industrial insertion tasks
Strong generalization to novel object geometries and non-rigid object placement

Why it matters

Enables reliable, high-precision robotic automation for manufacturing and complex manipulation tasks requiring adaptation to diverse object shapes and multi-modal scenes.

Abstract

Recent advances in robotic manipulation have highlighted the effectiveness of learning from demonstration. However, while end-to-end policies excel in expressivity and flexibility, they struggle both in generalizing to novel object geometries and in attaining a high degree of precision. An alternative, object-centric approach frames the task as pre- dicting the placement pose of the target object, providing a modular decomposition of the problem. Building on this goal- prediction paradigm, we propose TAX-DPD, a hierarchical, disentangled point diffusion framework that achieves state- of-the-art performance in placement precision, multi-modal coverage, and generalization to variations in object geometries and scene configurations. We model global scene-level place- ments through a novel feed-forward Dense Gaussian Mixture Model (GMM) that yields a spatially dense prior over global placements; we then model the local object-level configuration through a novel disentangled point cloud diffusion module that separately diffuses the object geometry and the placement frame, enabling precise local geometric reasoning. Interestingly, we demonstrate that our point cloud diffusion achieves substan- tially higher accuracy than a prior approach based on SE(3)- diffusion, even in the context of rigid object placement. We validate our approach across a suite of challenging tasks in simulation and in the real-world on high-precision industrial insertion tasks. Furthermore, we present results on a cloth- hanging task in simulation, indicating that our framework can further relax assumptions on object rigidity. Visualizations and supplementary materials can be found on our project website: https://3dgp-icra2026.github.io/. * Equal contribution 1 The authors are with Carnegie Mellon University, Pittsburgh, USA. {lyuxingh, eycai, shobhita, dheld}@andrew.cmu.edu 2 The author is with ABB Inc., USA. jianjun.wang@us.abb.com David Held holds concurrent appointments at CMU and as an Amazon Scholar. This paper describes work performed at CMU and is not associated with Amazon.

Index terms

Deep Learning in Grasping and Manipulation Learning from Demonstration Deep Learning Methods