Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Ayca Takmaz, Alexandros Delitzas, Robert W. Sumner, Francis Engelmann, Johanna Wald, Federico Tombari
AI summary
Problem
Existing open-vocabulary 3D segmentation methods are limited to either object-level instances or noisy, memory-intensive point-level features, failing to robustly segment finer-grained scene entities like object parts or attribute-based regions.
Approach
Search3D constructs a hierarchical tree representation of 3D scenes by combining class-agnostic object masks with geometric over-segmentation for parts, then enriches both levels with pixel-aligned SigLIP features fused across multiple views to enable open-vocabulary text queries.
Key results
- Hierarchical open-vocabulary 3D segmentation method for objects and parts
- Scene-scale open-vocabulary 3D part segmentation benchmark based on MultiScan
- Open-vocabulary hierarchical part annotations for ScanNet++ scenes
- Superior performance over baselines in object, part, and material segmentation
Why it matters
Enables robots and assistive systems to interact with complex 3D environments by understanding and querying fine-grained scene elements beyond basic object boundaries.
Abstract
Open-vocabulary 3D segmentation enables explo- ration of 3D spaces using free-form text descriptions. Existing methods for open-vocabulary 3D instance segmentation primarily focus on identifying object-level instances but struggle with finer-grained scene entities such as object parts, or regions described by generic attributes. In this work, we introduce Search3D, an approach to construct hierarchical open-vocabulary 3D scene representations, enabling 3D search at multiple levels of granularity: fine-grained object parts, entire objects, or regions described by attributes like materials. Unlike prior methods, Search3D shifts towards a more flexible open-vocabulary 3D search paradigm, moving beyond explicit object-centric queries. For systematic evaluation, we further contribute a scene-scale open-vocabulary 3D part segmentation benchmark based on MultiScan, along with a set of open-vocabulary fine-grained part annotations on ScanNet++. Search3D outperforms baselines in scene-scale open-vocabulary 3D part segmentation, while maintaining strong performance in segmenting 3D objects and materials. Our project page is search3d-segmentation.github.io.