PinPoint3D: Fine-Grained 3D Part Segmentation from a Few Clicks
Zhengyu Lin, Zhenhong Guo, Feng Zheng,,∗
AI summary
Problem
Existing interactive 3D segmentation methods focus on coarse instance-level targets and struggle with sparse real-world scans, while non-interactive approaches lack annotated data and perform poorly on noisy point clouds, hindering fine-grained part-level understanding for embodied AI.
Approach
The authors introduce a novel interactive framework that uses a dual-level transformer decoder with targeted attention masking to generate precise object and part masks from sparse point clouds guided by minimal user clicks, supported by a new 3D data synthesis pipeline for training.
Key results
- 55.8% average IoU with one click and >71.3% with minimal clicks
- Large-scale scene-level dataset with dense part annotations via novel synthesis pipeline
- Up to 16% IoU and precision improvement over interactive baselines
- Strong cross-domain generalization on MultiScan with reduced click requirements
Why it matters
Enables embodied AI and robotic systems to interact precisely with complex 3D environments by providing a highly efficient, low-effort interactive segmentation tool for fine-grained part manipulation.
Abstract
Fine-grained 3D part segmentation is crucial for enabling embodied AI systems to perform complex manipula- tion tasks, such as interacting with specific functional compo- nents of an object. However, existing interactive segmentation methods are largely confined to coarse, instance-level targets, while non-interactive approaches struggle with sparse, real- world scans and suffer from a severe lack of annotated data. To address these limitations, we introduce PinPoint3D, a novel interactive framework for fine-grained, multi-granularity 3D segmentation, capable of generating precise part-level masks from only a few user point clicks. A key component of our work is a new 3D data synthesis pipeline that we developed to create a large-scale, scene-level dataset with dense part annotations, overcoming a critical bottleneck that has hindered progress in this field. Through comprehensive experiments, we demonstrate that our method significantly outperforms existing approaches, achieving an average IoU of 55.8% on each object part with only one click and surpassing 71.3% IoU with a few additional click queries. Compared to current state-of-the-art baselines, PinPoint3D yields up to a 16% improvement in IoU and precision, highlighting its effectiveness and high efficiency on challenging, sparse point clouds. Our work represents a significant step towards more nuanced and precise machine perception and interaction in complex 3D environments. Our code, checkpoints and datasets can be found at the project website https://pinpoint3d.online.