← Back ICRA 2026

DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects

Jiahong Chen, JingHao Wang, Zi Wang, Ziwen Wang, Banglei Guan, Qifeng Yu

PDF

AI summary

Key figure (auto-extracted from paper)

DKPMV achieves state-of-the-art 6D pose estimation for textureless objects using only multi-view RGB images by fusing dense keypoints and leveraging symmetry-aware training.

6D pose estimation textureless objects multi-view RGB dense keypoints symmetry-aware training robotic perception

Problem

6D pose estimation for textureless objects is hindered by unreliable depth data on reflective surfaces and the scale/occlusion limitations of single-view RGB methods. Existing multi-view approaches either depend on degraded depth inputs or fail to fully exploit cross-view geometric consistency at the keypoint level.

Approach

DKPMV predicts dense keypoints from multiple RGB views, enhances them with attentional aggregation and symmetry-aware training, and fuses them via a three-stage progressive optimization pipeline to recover accurate 6D poses.

Key results

Dense keypoint-level fusion using only multi-view RGB inputs
Symmetry-aware training resolves pose ambiguities on symmetric objects
Attentional aggregation improves keypoint localization and fusion stability
Surpasses state-of-the-art RGB and RGB-D methods on the ROBI dataset

Why it matters

Provides a robust, cost-effective solution for real-time industrial robotic perception where depth sensors fail or are impractical.

Abstract

6D pose estimation of textureless objects is valu- able for industrial robotic applications, yet remains challenging due to the frequent loss of depth information. Current multi- view methods either rely on depth data or insufficiently exploit multi-view geometric cues, limiting their performance. In this paper, we propose DKPMV, a pipeline that achieves dense keypoint-level fusion using only multi-view RGB images as in- put. We design a three-stage progressive pose optimization strat- egy that leverages dense multi-view keypoint geometry informa- tion. To enable effective dense keypoint fusion, we enhance the keypoint network with attentional aggregation and symmetry- aware training, improving prediction accuracy and resolving ambiguities on symmetric objects. Extensive experiments on the ROBI dataset demonstrate that DKPMV outperforms state-of- the-art multi-view RGB and RGB-D approaches. The code will be available at https://github.com/chenjiahongbq/DKPMV.

Index terms

Deep Learning for Visual Perception Recognition Computer Vision for Automation