Caspar: CUDA Accelerator for Symbolic Programming with Adaptive Reordering
Emil Martens, Aaron Miller, Matias Varnum, Annette Stahl
AI summary
Problem
Robotics estimation tasks require solving large, sparse nonlinear least-squares problems efficiently, but existing GPU solvers are rigid and difficult to customize for new symbolic expressions without sacrificing performance.
Approach
Caspar converts user-defined symbolic expressions into a Directed Acyclic Bipartite Symbolic Expression Graph (DABSEG), then applies adaptive register reordering, partial subexpression elimination, and custom memory accessors to generate highly optimized, problem-specific CUDA kernels.
Key results
- 5–20× speedup over leading bundle adjusters on the BAL dataset
- Reduced memory footprint via optimized register allocation
- Accuracy comparable to existing state-of-the-art solvers
- Automatic kernel generation from symbolic residuals
Why it matters
Enables robotics developers to rapidly prototype and deploy high-performance GPU solvers for real-time state estimation, motion planning, and calibration without deep CUDA expertise.
Abstract
We present Caspar, a library that makes the power of modern GPUs more accessible in robotics and provides a state-of-the-art nonlinear GPU solver that can be applied to a wide range of different optimization problems. Caspar bridges the gap between expressive symbolic programming in Python and high-performance GPU runtimes in C++ by automatically generating optimized CUDA kernels from symbolic expressions. Building on the SymForce library, users can easily define and combine symbolic expressions, including Lie group operations, to generate custom CUDA kernels. To use Caspar as a solver, users need only define the symbolic residual functions; Caspar then uses symbolic differentiation to generate the necessary GPU kernels and interfaces to perform nonlinear optimization. In this paper, we present the core components of Caspar and showcase its performance by performing bundle adjustment on the Bundle Adjustment in the Large (BAL) dataset. We bench- mark Caspar against other state-of-the-art bundle adjusters and show that it is 5 to 20 times faster than the best alternative, requires less memory, and achieves similar accuracy. This illus- trates the benefit of our symbolic GPU programming approach. Caspar is released as part of SymForce and is freely available at https://github.com/symforce-org/symforce.