GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
Alexander Du, Emre Adabag, Gabriel Bravo-Palacios, Brian Plancher
AI summary
Problem
Existing GPU-accelerated trajectory optimization solvers struggle to handle moderate batches (tens to low-hundreds of solves) in real-time, often sacrificing speed, generality, or scalability.
Approach
GATO co-designs algorithm, software, and hardware to parallelize trajectory optimization across block, warp, and thread levels on the GPU, enabling efficient batched solves without sacrificing accuracy.
Key results
- 18–21× speedup over CPU baselines and up to 16× over GPU baselines
- kHz control rates for real-time batched trajectory optimization
- Faster convergence and improved disturbance rejection in MPC case studies
- Open-source implementation validated on a KUKA industrial manipulator
Why it matters
Enables real-time, scalable model predictive control for resource-constrained edge robotics applications requiring batched trajectory optimization.
Abstract
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches either parallelize single solves, handle large batches at sub-real-time rates, or sacrifice model generality for speed. This leaves a large gap in solver per- formance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demon- strate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18 −21× over CPU baselines and 1.4 −16× over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.