← Back ICRA 2026

EZ-SP: Fast and Lightweight Superpoint-Based 3D Segmentation

Louis Geist, Loic Landrieu, Damien Robert

PDF

AI summary

Key figure (auto-extracted from paper)

EZ-SP matches state-of-the-art 3D semantic segmentation accuracy with a model under 2 MB VRAM and 72× faster inference by replacing CPU-bound partitioning with a fully GPU-accelerated, learnable superpoint clustering pipeline.

3D semantic segmentation superpoint partitioning GPU acceleration lightweight deep learning point cloud processing real-time perception

Problem

Superpoint-based 3D segmentation pipelines are bottlenecked by slow, CPU-bound partitioning steps that rely on handcrafted features and complex optimization, limiting their scalability and real-time deployment.

Approach

A lightweight, fully GPU-based pipeline that learns point embeddings to detect semantic boundaries, then uses a parallel combinatorial clustering algorithm to rapidly partition points into superpoints for downstream classification.

Key results

13× faster superpoint partitioning than prior graph-based methods
72× faster end-to-end inference matching SOTA accuracy across indoor, driving, and aerial datasets
Model fits in under 2 MB VRAM with a 60k-parameter backbone
Real-time processing of multi-million-point scenes on a single consumer GPU

Why it matters

Enables real-time, large-scale 3D semantic segmentation for resource-constrained robotic, autonomous driving, and AR/VR applications without sacrificing accuracy.

Abstract

Superpoint-based pipelines provide an efficient alternative to point- or voxel-based 3D semantic segmentation, but are often bottlenecked by their CPU-bound partition step. We propose a learnable, fully GPU partitioning algorithm that generates geometrically and semantically coherent superpoints 13× faster than prior methods. Our module is compact (under 60k parameters), trains in under 20 minutes with a differentiable surrogate loss, and requires no handcrafted features. Combined with a lightweight superpoint classifier, the full pipeline fits in <2 MB of VRAM, scales to multi-million-point scenes, and supports real-time inference. With 72× faster inference and 120× fewer parameters, EZ-SP matches the accuracy of point- based SOTA models across three domains: indoor scans (S3DIS), autonomous driving (KITTI-360), and aerial LiDAR (DALES). Code and pretrained models are accessible at github.com/ drprojects/superpoint_transformer.

Index terms

Semantic Scene Understanding Deep Learning for Visual Perception Object Detection Segmentation and Categorization