← Back ICRA 2026

Caspar: CUDA Accelerator for Symbolic Programming with Adaptive Reordering

Emil Martens, Aaron Miller, Matias Varnum, Annette Stahl

PDF

AI summary

Key figure (auto-extracted from paper)

Caspar automatically compiles symbolic robotics residuals into optimized CUDA kernels, delivering a nonlinear solver that is 5–20× faster than state-of-the-art alternatives with lower memory usage and comparable accuracy.

GPU acceleration symbolic programming nonlinear optimization CUDA robotics bundle adjustment

Problem

Robotics estimation tasks require solving large, sparse nonlinear least-squares problems efficiently, but existing GPU solvers are rigid and difficult to customize for new symbolic expressions without sacrificing performance.

Approach

Caspar converts user-defined symbolic expressions into a Directed Acyclic Bipartite Symbolic Expression Graph (DABSEG), then applies adaptive register reordering, partial subexpression elimination, and custom memory accessors to generate highly optimized, problem-specific CUDA kernels.

Key results

5–20× speedup over leading bundle adjusters on the BAL dataset
Reduced memory footprint via optimized register allocation
Accuracy comparable to existing state-of-the-art solvers
Automatic kernel generation from symbolic residuals

Why it matters

Enables robotics developers to rapidly prototype and deploy high-performance GPU solvers for real-time state estimation, motion planning, and calibration without deep CUDA expertise.

Abstract

We present Caspar, a library that makes the power of modern GPUs more accessible in robotics and provides a state-of-the-art nonlinear GPU solver that can be applied to a wide range of different optimization problems. Caspar bridges the gap between expressive symbolic programming in Python and high-performance GPU runtimes in C++ by automatically generating optimized CUDA kernels from symbolic expressions. Building on the SymForce library, users can easily define and combine symbolic expressions, including Lie group operations, to generate custom CUDA kernels. To use Caspar as a solver, users need only define the symbolic residual functions; Caspar then uses symbolic differentiation to generate the necessary GPU kernels and interfaces to perform nonlinear optimization. In this paper, we present the core components of Caspar and showcase its performance by performing bundle adjustment on the Bundle Adjustment in the Large (BAL) dataset. We bench- mark Caspar against other state-of-the-art bundle adjusters and show that it is 5 to 20 times faster than the best alternative, requires less memory, and achieves similar accuracy. This illus- trates the benefit of our symbolic GPU programming approach. Caspar is released as part of SymForce and is freely available at https://github.com/symforce-org/symforce.

Index terms

Mapping Optimization and Optimal Control Performance Evaluation and Benchmarking