← Back ICRA 2026

Memory Efficient Point Cloud Registration Accelerator on FPGA

Chang Qiong, Dongqi Cai, Ran Dong, Junpei Zhong

PDF

AI summary

Key figure (auto-extracted from paper)

An FPGA-based ICP accelerator achieves over 1.5× speedup and 99% memory reduction compared to mobile GPUs, enabling real-time point cloud registration on resource-constrained edge devices.

Point cloud registration FPGA accelerator ICP algorithm Memory efficiency Edge computing Real-time vision

Problem

Existing GPU-based ICP registration methods demand excessive memory (~2GB) and power, making them unsuitable for deployment on memory-limited mobile and edge platforms.

Approach

The authors design a memory-efficient FPGA accelerator for the VAN-ICP algorithm that uses a pre-traversal technique for dynamic voxel memory allocation and a custom hardware SVD module to accelerate rigid transformation computation.

Key results

Real-time FPGA-based ICP framework for resource-constrained environments
Pre-traversal technique for dynamic on-chip memory allocation
Custom FPGA-optimized SVD module for low-latency transformation
>1.5× speedup and 99% memory reduction over mobile GPU implementations

Why it matters

Enables high-performance, real-time 3D perception and robotic vision on edge devices where power and memory are strictly limited.

Abstract

Point cloud registration, which aligns multiple datasets into a unified coordinate system, is critical for mobile applications such as 3D SLAM and autonomous driving. Among existing methods, Iterative Closest Point (ICP) remains a widely used method for rigid registration due to its robustness and simplicity. However, its performance on mobile platforms is hindered by iterative computations and limited memory resources. This paper proposes a high-performance ICP reg- istration framework implemented on FPGA. Building upon an efficient GPU-based method named VAN-ICP, our FPGA- based ICP accelerator achieves greater memory efficiency and faster processing speed, making it ideal for resource-constrained mobile platforms. Experimental results demonstrate a speedup of over 1.5× compared to mobile GPU-based implementations and a 99% reduction in memory usage, validating the effec- tiveness of the proposed approach for real-world point cloud registration on edge platforms. Beyond these improvements, the proposed framework also facilitates advancements in robotic vision technologies by enabling more accurate and efficient perception under stringent hardware constraints.

Index terms

Computer Architecture for Robotic and Automation Hardware-Software Integration in Robotics Embedded Systems for Robotic and Automation