← Back ICRA 2026

GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

Alexander Du, Emre Adabag, Gabriel Bravo-Palacios, Brian Plancher

PDF

AI summary

Key figure (auto-extracted from paper)

GATO enables real-time, batched trajectory optimization on GPUs, delivering up to 21× speedups over CPU baselines and 16× over existing GPU solvers for moderate batch sizes.

GPU acceleration trajectory optimization model predictive control batched solving edge robotics real-time control

Problem

Existing GPU-accelerated trajectory optimization solvers struggle to handle moderate batches (tens to low-hundreds of solves) in real-time, often sacrificing speed, generality, or scalability.

Approach

GATO co-designs algorithm, software, and hardware to parallelize trajectory optimization across block, warp, and thread levels on the GPU, enabling efficient batched solves without sacrificing accuracy.

Key results

18–21× speedup over CPU baselines and up to 16× over GPU baselines
kHz control rates for real-time batched trajectory optimization
Faster convergence and improved disturbance rejection in MPC case studies
Open-source implementation validated on a KUKA industrial manipulator

Why it matters

Enables real-time, scalable model predictive control for resource-constrained edge robotics applications requiring batched trajectory optimization.

Abstract

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches either parallelize single solves, handle large batches at sub-real-time rates, or sacrifice model generality for speed. This leaves a large gap in solver per- formance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demon- strate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18 −21× over CPU baselines and 1.4 −16× over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.

Index terms

Optimization and Optimal Control Software Architecture for Robotic and Automation Control Architectures and Programming