← Back ICRA 2026

PartPose: Attentive 6D Pose Estimation by Focusing on Graspable Parts of Multi-Part Deformable Objects

Ryo Okumura, Tadahiro Taniguchi

PDF

AI summary

Key figure (auto-extracted from paper)

PartPose drastically improves 6D pose estimation and robotic picking success for multi-part deformable objects by isolating and matching only their rigid, graspable components.

6D pose estimation deformable objects robotic picking region of interest semantic correspondence Bayesian optimization

Problem

Standard 6D pose estimators assume rigid bodies and fail on multi-part deformable objects common in logistics, where flexible parts cause severe model-to-reality matching errors.

Approach

The method uses semantic keypoint transfer to locate a rigid region of interest, then applies Bayesian optimization to tune ROI parameters for a focused render-and-compare pose estimation pipeline.

Key results

98.2% translational and 96.4% rotational pose estimation success rates
87.2% robotic picking success rate versus 22.8% baseline
Zero-shot generalization to unseen objects within the same category
Automated ROI parameter optimization via Bayesian search

Why it matters

Provides a practical, training-free solution for automating warehouse logistics where robots must handle unpredictable, non-rigid items.

Abstract

This study tackles robotic picking of multi-part deformable objects—common in warehouses yet underexplored in the literature—such as cable- attached appliances and pouch drinks, which com- prise both rigid and deformable components. Their deformability poses a challenge to model-based 6D pose estimators, such as FoundationPose, that assume rigid bodies. To address this, we present PartPose, which estimates the 6D pose of the multi-part deformable objects by focusing on the rigid components. PartPose uses Bayesian optimization to select an appropriate region of interest (ROI) and then estimates its pose with a render-and-compare pipeline. We evaluate pose- estimation and picking success rates on nine multi- part deformable objects, counting a pose estimate as successful if the translational error is ≤30 mm and the rotational error is ≤0.3 radians. PartPose significantly outperforms a FoundationPose baseline, achieving suc- cess rates of 98.2% (translational), 96.4% (rotational), and 87.2% (picking), versus 47.9%, 35.9%, and 22.8%, respectively. Moreover, PartPose generalizes category- level semantic knowledge to new instances within the same category without performance degradation when those instances have semantically similar components. This capability is crucial for large logistics centers that handle diverse and novel objects.

Index terms

Perception for Grasping and Manipulation AI and Machine Learning in Manufacturing and Logistics Systems Computer Vision for Automation