EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation
Louis Geist, Loic Landrieu, Damien Robert
AI summary
Problem
Superpoint-based 3D segmentation pipelines are bottlenecked by slow, CPU-bound partitioning steps that rely on handcrafted features and complex optimization, limiting their scalability and real-time deployment.
Approach
A lightweight, fully GPU-based pipeline that learns point embeddings to detect semantic boundaries, then uses a parallel combinatorial clustering algorithm to rapidly partition points into superpoints for downstream classification.
Key results
- 13× faster superpoint partitioning than prior graph-based methods
- 72× faster end-to-end inference matching SOTA accuracy across indoor, driving, and aerial datasets
- Model fits in under 2 MB VRAM with a 60k-parameter backbone
- Real-time processing of multi-million-point scenes on a single consumer GPU
Why it matters
Enables real-time, large-scale 3D semantic segmentation for resource-constrained robotic, autonomous driving, and AR/VR applications without sacrificing accuracy.
Abstract
Superpoint-based pipelines provide an efficient alternative to point- or voxel-based 3D semantic segmentation, but are often bottlenecked by their CPU-bound partition step. We propose a learnable, fully GPU partitioning algorithm that generates geometrically and semantically coherent superpoints 13× faster than prior methods. Our module is compact (under 60k parameters), trains in under 20 minutes with a differentiable surrogate loss, and requires no handcrafted features. Combined with a lightweight superpoint classifier, the full pipeline fits in <2 MB of VRAM, scales to multi-million-point scenes, and supports real-time inference. With 72× faster inference and 120× fewer parameters, EZ-SP matches the accuracy of point- based SOTA models across three domains: indoor scans (S3DIS), autonomous driving (KITTI-360), and aerial LiDAR (DALES). Code and pretrained models are accessible at github.com/ drprojects/superpoint_transformer.