← Back ICRA 2026

HIPPo: Harnessing Image-To-3D Priors for Model-Free Zero-Shot 6D Pose Estimation

Yibo Liu, Zhaodong Jiang, Binbin Xu, Guile Wu, Yuan Ren, Tongtong Cao, Bingbing Liu, Rui Heng Yang, Amir Rasouli, Jinjun Shan

PDF

AI summary

Key figure (auto-extracted from paper)

HIPPo achieves instant, model-free 6D pose estimation for novel objects by generating and continuously refining a 3D mesh from a single image, surpassing state-of-the-art methods under limited reference conditions.

6D pose estimation diffusion models image-to-3D model-free zero-shot robotic perception

Problem

Existing 6D pose estimation methods require pre-curated CAD models or dense reference images, which are labor-intensive to prepare and often unavailable in real-world settings where instant robotic reaction is needed.

Approach

HIPPo instantly generates a physically scaled 3D mesh from a single image using diffusion priors, then continuously refines the mesh online by replacing unreliable diffusion predictions with actual sensor measurements as the robot observes the object.

Key results

Instant 3D mesh generation from a single image in seconds
Online mesh refinement via measurement-guided optimization
Superior 6D pose estimation accuracy with limited reference images
Complete mesh maintained from the first glance for immediate robotic use

Why it matters

This approach allows robots to instantly recognize and interact with novel objects in unstructured environments without prior 3D scanning or reference data.

Abstract

This work focuses on the problem of 6D pose estima- tion for novel objects when a reference 3D model or posed reference images are not available. While existing methods can estimate the precise 6D pose of objects, they heavily rely on curated CAD models or reference images, the preparation of which is a time-consuming and labor-intensive process. Moreover, in real-world scenarios, 3D models or reference images may not be available in advance and instant robot reaction is desired. In this work, we propose a novel framework named HIPPo, which eliminates the need for curated CAD models and reference images by harnessing image-to-3D priors from Diffusion Models, enabling model-free zero-shot 6D pose estimation. Specifically, we construct HIPPo Dreamer, a rapid image-to-mesh model built on a multiview Diffusion Model and a 3D reconstruction foundation model. Our HIPPo Dreamer can generatea3Dmeshofanyunseenobjectsfromasingleglanceinjust a few seconds. Then, as more observations are acquired, we propose to continuously refine the diffusion prior mesh model by joint opti- mization of object geometry and appearance. This is achieved by a measurement-guided scheme that gradually replaces the plausible diffusion priors with more reliable online observations. Conse- quently, HIPPo can instantly estimate and track the 6D pose of a novel object and maintain a complete mesh for immediate robotic applications. Thorough experiments on various benchmarks show that HIPPo outperforms state-of-the-art methods in 6D object pose estimation when prior reference images are limited.

Index terms

AI-Based Methods Computer Vision for Automation AI-Enabled Robotics