← Back ICRA 2026

SLNet: A Super-Lightweight Geometry-Adaptive Network for 3D Point Cloud Recognition

Saeid Mohammad, Amir Salarpour, Pedram MohajerAnsari, Mert D. Pesé

PDF

AI summary

Key figure (auto-extracted from paper)

SLNet achieves state-of-the-art accuracy-efficiency trade-offs for 3D point cloud recognition using ultra-lightweight, non-parametric geometric encoding and minimal learnable parameters.

3D point cloud lightweight network edge deployment non-parametric encoding geometric modeling efficiency-accuracy trade-off

Problem

Existing 3D point cloud backbones are either too computationally expensive for edge deployment or lack accuracy when stripped down to ultra-compact sizes, creating a gap in efficient, high-performance models for resource-constrained real-time applications.

Approach

The authors introduce SLNet, a four-stage hierarchical backbone that combines a parameter-free adaptive point embedding (NAPE) using Gaussian and cosine bases with a lightweight per-channel modulation unit (GMU), enabling strong geometric modeling with minimal parameters and FLOPs.

Key results

SLNet-S achieves 93.64% accuracy on ModelNet40 with 0.14M parameters and 0.31 GFLOPs
SLNet-M exceeds PointMLP accuracy with 24× fewer parameters
SLNet-T reaches 58.2% mIoU on S3DIS with 17× fewer parameters than Point Transformer V3
NetScore+ metric strongly correlates with real-world hardware throughput

Why it matters

It enables high-performance 3D perception on edge devices and resource-constrained hardware without sacrificing accuracy, making it highly relevant for autonomous systems, robotics, and AR/VR applications.

Abstract

We present SLNet, a lightweight backbone for 3D point cloud recognition designed to achieve strong performance without the computational cost of many recent attention, graph, and deep MLP based models. The model is built on two simple ideas: NAPE (Nonparametric Adaptive Point Embedding), which captures spatial structure using a combination of Gaus- sian RBF and cosine bases with input adaptive bandwidth and blending, and GMU (Geometric Modulation Unit), a per chan- nel affine modulator that adds only 2D learnable parameters. These components are used within a four stage hierarchical en- coder with FPS+kNN grouping, nonparametric normalization, and shared residual MLPs. In experiments, SLNet shows that a very small model can still remain highly competitive across several 3D recognition tasks. On ModelNet40, SLNet-S with 0.14M parameters and 0.31 GFLOPs achieves 93.64% overall accuracy, outperforming PointMLP-elite with 5× fewer param- eters, while SLNet-M with 0.55M parameters and 1.22 GFLOPs reaches 93.92%, exceeding PointMLP with 24× fewer param- eters. On ScanObjectNN, SLNet-M achieves 84.25% overall accuracy within 1.2 percentage points of PointMLP while using 28× fewer parameters. For large scale scene segmentation, SLNet-T extends the backbone with local Point Transformer attention and reaches 58.2% mIoU on S3DIS Area 5 with only 2.5M parameters, more than 17× fewer than Point Transformer V3. We also introduce NetScore+, which extends NetScore by incorporating latency and peak memory so that efficiency can be evaluated in a more deployment oriented way. Across multiple benchmarks and hardware settings, SLNet delivers a strong overall balance between accuracy and efficiency. Code is available at: https://github.com/m-saeid/SLNet.

Index terms

Deep Learning for Visual Perception RGB-D Perception Vision-Based Navigation