Research Analyzer
← Back ICRA 2026

SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation

Yufeng Jin, Niklas Wilhelm Funk, Vignesh Prasad, Zechu Li, Mathias Franzius, Jan Peters, Georgia Chalvatzaki

PDF

AI summary

Key figure (auto-extracted from paper)
A probabilistic SE(3) flow matching framework models full 6D pose distributions to handle ambiguity, enabling safer and more reliable robotic manipulation under occlusions and symmetries.
6D Pose Estimation Flow Matching SE(3) Manifold Uncertainty Quantification Robotic Manipulation Probabilistic Modeling

Problem

Deterministic deep networks for 6D object pose estimation are overconfident and fail to capture multi-modal pose distributions caused by partial observability, occlusions, and object symmetries, which is critical for safe robotic manipulation.

Approach

The method uses flow matching on the SE(3) manifold with a masked cross-attention DiT module to generate sample-based estimates of the full pose distribution, allowing explicit reasoning about uncertainty in ambiguous scenarios.

Key results

  • State-of-the-art accuracy on REAL275, YCB-V, and LM-O benchmarks
  • Novel SE(3) flow matching framework with adapted DiT blocks for robust distribution estimation
  • Effective model-free clustering and geometric scoring strategies for pose selection
  • Successful integration into active perception and uncertainty-aware grasp synthesis

Why it matters

It provides a scalable, uncertainty-aware foundation for safe robotic manipulation in real-world environments where object poses are ambiguous or partially observed.

Abstract

Object pose estimation is a fundamental problem in robotics and computer vision, yet it remains challenging due to partial observability, occlusions, and object symmetries, which inevitably lead to pose ambiguity and multiple hypotheses consistent with the same observation. While deterministic deep networks achieve impressive performance under well- constrained conditions, they are often overconfident and fail to capture the multi-modality of the underlying pose distribution. To address these challenges, we propose a probabilistic frame- work that leverages flow matching on the SE(3) manifold for estimating 6D object pose distributions. Unlike existing methods that regress a single deterministic output, our approach models the full pose distribution with a sample-based estimate and enables reasoning about uncertainty in ambiguous cases such as symmetric objects or severe occlusions. We achieve state-of-the- art results on REAL275, YCB-V and LM-O, and demonstrate how our sample-based pose estimates can be leveraged in down- stream robotic manipulation tasks such as active perception for disambiguating uncertain viewpoints, or guiding grasp synthesis in an uncertainty-aware manner.

Index terms

Deep Learning for Visual Perception RGB-D Perception Perception for Grasping and Manipulation

Related papers