GFreeDet2: Exploiting Gaussian Splatting and Foundation Models for RGB-Based Model-Free 2D and 6D Detection of Unseen Objects
Gu Wang, Xingyu Liu, Jingyi Tang, Chengxi Li, Yingyue Li, Ziqin Huang, Xiangyang Ji
AI summary
Problem
Existing unseen object detection methods rely on costly CAD models, require depth sensors, or are limited to specific camera types, hindering scalability in open-world robotics and mixed reality.
Approach
The method reconstructs 3D Gaussian object models from multi-view RGB references using projection-aware perspective cropping to handle both pinhole and fisheye cameras, then integrates these models into foundation model-driven detection pipelines.
Key results
- First complete model-free RGB-based 2D and 6D detection on the BOP-H3 benchmark
- State-of-the-art performance across pinhole and fisheye datasets
- Minimal-modification integration of Gaussian models into foundation model pipelines
- Unified pinhole and fisheye handling via projection-aware perspective cropping
Why it matters
It provides a scalable, cost-effective alternative to CAD-dependent methods, enabling robust object detection in open-world robotics and mixed reality applications using only RGB images.
Abstract
We introduce GFreeDet2, which leverages Gaussian Splatting and foundation models to address RGB- based model-free 2D detection and 6D detection of unseen objects. GFreeDet2 reconstructs 3D Gaussian object models from multi-view RGB references, enabling efficient model-free detection without relying on CAD models. To accelerate reconstruction and consistently handle both pinhole and fisheye cameras, we propose projection-aware perspective cropping (PAPC) with visual hull initialization. PAPC further improves coarse 6D detection by accurately extracting pinhole crops from fisheye query images. The Gaussian objects enable rendering in place of CAD models within foundation model- driven pipelines, allowing existing state-of-the-art RGB-based methods for unseen 2D and 6D detection to be extended to the model-free setting with minimal modifications. Extensive experiments on all three BOP-H3 datasets demonstrate that GFreeDet2 achieves state-of-the-art performance and establishes a strong baseline for RGB-based, model-free 2D and 6D unseen object detection. The code is publicly available at github.com/wangg12/GFreeDet2.git.