HIPPo: Harnessing Image-To-3D Priors for Model-Free Zero-Shot 6D Pose Estimation
Yibo Liu, Zhaodong Jiang, Binbin Xu, Guile Wu, Yuan Ren, Tongtong Cao, Bingbing Liu, Rui Heng Yang, Amir Rasouli, Jinjun Shan
AI summary
Problem
Existing 6D pose estimation methods require pre-curated CAD models or dense reference images, which are labor-intensive to prepare and often unavailable in real-world settings where instant robotic reaction is needed.
Approach
HIPPo instantly generates a physically scaled 3D mesh from a single image using diffusion priors, then continuously refines the mesh online by replacing unreliable diffusion predictions with actual sensor measurements as the robot observes the object.
Key results
- Instant 3D mesh generation from a single image in seconds
- Online mesh refinement via measurement-guided optimization
- Superior 6D pose estimation accuracy with limited reference images
- Complete mesh maintained from the first glance for immediate robotic use
Why it matters
This approach allows robots to instantly recognize and interact with novel objects in unstructured environments without prior 3D scanning or reference data.
Abstract
This work focuses on the problem of 6D pose estima- tion for novel objects when a reference 3D model or posed reference images are not available. While existing methods can estimate the precise 6D pose of objects, they heavily rely on curated CAD models or reference images, the preparation of which is a time-consuming and labor-intensive process. Moreover, in real-world scenarios, 3D models or reference images may not be available in advance and instant robot reaction is desired. In this work, we propose a novel framework named HIPPo, which eliminates the need for curated CAD models and reference images by harnessing image-to-3D priors from Diffusion Models, enabling model-free zero-shot 6D pose estimation. Specifically, we construct HIPPo Dreamer, a rapid image-to-mesh model built on a multiview Diffusion Model and a 3D reconstruction foundation model. Our HIPPo Dreamer can generatea3Dmeshofanyunseenobjectsfromasingleglanceinjust a few seconds. Then, as more observations are acquired, we propose to continuously refine the diffusion prior mesh model by joint opti- mization of object geometry and appearance. This is achieved by a measurement-guided scheme that gradually replaces the plausible diffusion priors with more reliable online observations. Conse- quently, HIPPo can instantly estimate and track the 6D pose of a novel object and maintain a complete mesh for immediate robotic applications. Thorough experiments on various benchmarks show that HIPPo outperforms state-of-the-art methods in 6D object pose estimation when prior reference images are limited.