Infinite-Horizon Value Function Approximation for Model Predictive Control
Armand Jordana, Sebastien Kleff, Arthur Haffemayer, Joaquim Ortiz-Haro, Justin Carpentier, Nicolas Mansard, Ludovic Righetti
AI summary
Problem
Finite-horizon MPC lacks global stability guarantees and suffers from local minima, while offline learning methods struggle to enforce hard constraints safely. Practitioners must tediously design cost functions to approximate infinite-horizon behavior.
Approach
The method trains a neural network to approximate the infinite-horizon value function using value iteration and trajectory optimization, then deploys it as a terminal cost in a real-time constrained MPC loop.
Key results
- Local gradient-based solvers successfully approximate infinite-horizon constrained value functions via value iteration
- Online trajectory optimization compensates for approximation errors, improving control optimality
- Conditioning the value function on goals and obstacles enables scalable real-time obstacle avoidance
- The approach guarantees hard constraint satisfaction and global stability outside the training distribution
Why it matters
Bridges offline learning and online optimization to provide a practical, safe, and globally stable control framework for real-world robotics applications.
Abstract
Model Predictive Control has emerged as a popular tool for robots to generate complex motions. However, the real- time requirement has limited the use of hard constraints and large preview horizons, which are necessary to ensure safety and stability. In practice, practitioners have to carefully design cost functions that can imitate an infinite horizon formulation, which is tedious and often results in local minima. In this work, we study how to approximate the infinite horizon value function of constrained optimal control problems with neural networks using value iteration and trajectory optimization. Furthermore, we experimentally demonstrate how using this value function approximation as a terminal cost provides global stability to the model predictive controller. The approach is validated on two toy problems and a real-world scenario with online obstacle avoidance on an industrial manipulator where the value function is conditioned to the goal and obstacle.