Research Analyzer
← Back IROS 2024

DAP: Diffusion-Based Affordance Prediction for Multi-Modality Storage

Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias

PDF

Abstract

Solving storage problems—where objects must be accurately placed into containers with precise orientations and positions—presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inher- ent multi-modality of solution spaces, where multiple viable goal configurations exist for the same storage container. We present a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem. DAP leverages a two-step approach, initially identifying a placeable region on the container and then precisely computing the relative pose between the object and that region. Existing methods either struggle with multi-modality issues or computation-intensive training. Our experiments demonstrate DAP’s superior per- formance and training efficiency over the current state-of- the-art RPDiff, achieving remarkable results on the RPDiff benchmark. Additionally, our experiments showcase DAP’s data efficiency in real-world applications, an advancement over existing simulation-driven approaches. Our contribution fills a gap in robotic manipulation research by offering a solution that is both computationally efficient and capable of handling real-world variability. Code and supplementary material can be found at: https://github.com/changhaonan/DPS.git. Fig. 1: Visualization of the backward diffusion process in affor- dance prediction. Rows represent different samples, and columns represent diffusion steps, from 99 to 0. Yellow indicates placeable regions, while purple indicates non-placeable areas. Initially, the scene shows random segmentation, which gradually converges to four placeable regions as the process progresses.

Index terms

Deep Learning in Grasping and Manipulation Learning from Demonstration Representation Learning