InstantPose: Zero-Shot Instance-Level 6D Pose Estimation from a Single View
Francesco Di Felice, Alberto Remus, Stefano Gasperini, Benjamin Busam, Lionel Ott, Stefan Thalhammer, Federico Tombari, Carlo Alberto Avizzano
AI summary
Problem
Current instance-level pose estimation methods depend on costly 3D CAD models or multiple posed reference images, which restricts their use in real-world robotic applications involving novel objects.
Approach
The method feeds a single RGB reference into a Large Reconstruction Model to generate a coarse 3D mesh, then aligns it to a query RGB-D view using semantic feature matching and refines the pose through an online optimization process that corrects for geometric inaccuracies.
Key results
- Surpasses accuracy of methods requiring perfect 3D models on the YCB-V dataset
- Enables successful robotic grasping from single-view pose estimates
- Eliminates dependency on posed reference images or manual 3D scans
- Provides a training-free pipeline for zero-shot instance-level pose estimation
Why it matters
It offers a practical, real-time solution for robotic manipulation in unstructured environments where obtaining 3D object models is infeasible.
Abstract
Object pose estimation using visual data is crucial for robotic interaction with the environment. Many existing instance- level methods are restricted by their requirements for 3D CAD models or multiple object views, which limits their flexibility and generalizability. Overcoming this limitation is critical to enhance the adaptability of pose estimation systems. In this work, a novel pipeline that leverages recent advances in reconstruction tech- niques is presented to address these challenges. To this end, Large Reconstruction Models (LRM) represent an advanced neural ar- chitecture capable of generating 3D object models from a limited set of views. Nevertheless, the resulting 3D models often lack relevant geometric and texture details due to insufficient input information. This research presents InstantPose, an innovative zero-shot instance-level pose estimation method that, building upon LRM, can determine the pose of unseen objects using as little as a single unposed RGB reference and RGB-D query images. Extensive experiments demonstrate that InstantPose achieves remarkable performance in object pose estimation on the YCB-V dataset, compared to methods conceived to rely on a geometrically perfect object’s model. Furthermore, the 6D pose provided through the presented approach facilitates successful object grasping, high- lighting its practical utility in robotic manipulation tasks.