← Back ICRA 2026

Graphite: A GPU-Accelerated Mixed-Precision Graph Optimization Framework

Shishir Gopinath, Karthik Dantu, Steven Ko

PDF

AI summary

Graphite delivers up to 59× speedups for SLAM bundle adjustment on GPUs while cutting memory usage by up to 78% compared to specialized solvers.

GPU acceleration mixed-precision optimization bundle adjustment SLAM nonlinear least squares graph optimization

Problem

Existing GPU-accelerated optimizers struggle with complex, user-defined data types common in SLAM, require cumbersome language interoperation, or consume excessive GPU memory, hindering real-time deployment on resource-constrained devices.

Approach

Graphite introduces a CUDA C++ framework that uses a descriptor-based batching model to process identical graph elements in parallel, supporting mixed-precision solving and in-place optimization to minimize memory overhead and data transfer.

Key results

General mixed-precision framework supporting 64-bit, 32-bit, and 16-bit floating-point types
Descriptor batching model that eliminates GPU thread branching for identical vertices and constraints
Up to 59× speedup over CPU baselines for global visual-inertial bundle adjustment in ORB-SLAM3
Up to 78% reduction in GPU memory usage compared to specialized solvers like MegBA

Why it matters

Enables efficient, large-scale nonlinear optimization for real-time SLAM and robotics applications on both desktop and embedded hardware.

Abstract

We present Graphite, a GPU-accelerated nonlin- ear least squares graph optimization framework. It provides a CUDA C++ interface to enable the sharing of code between a real-time application, such as a SLAM system, and its optimization tasks. The framework supports techniques to reduce memory usage, including in-place optimization, support for multiple floating point types and mixed-precision modes, and dynamically computed Jacobians. We evaluate Graphite on well-known bundle adjustment problems and find that it achieves similar performance to MegBA, a solver specialized for bundle adjustment, while maintaining generality and using less memory. We also apply Graphite to global visual-inertial bundle adjustment on maps generated from stereo-inertial SLAM datasets, and observe speed-ups of up to 59× compared to a CPU baseline. Our results indicate that our framework enables faster large-scale optimization on both desktop and resource-constrained devices.

Index terms

Visual-Inertial SLAM Mapping Performance Evaluation and Benchmarking