← Back ICRA 2026

TD-CD-MPPI: Temporal-Difference Constraint-Discounted Model Predictive Path Integral Control

Pietro Noah Crestaz, Ludovic De Matteïs, Elliot Chane-Sane, Nicolas Mansard, Andrea Del Prete

PDF

AI summary

Key figure (auto-extracted from paper)

Augmenting MPPI with an offline-learned value function and constraint-based discounting enables efficient, long-horizon reasoning and robust constraint handling using significantly shorter planning horizons.

Model Predictive Path Integral Temporal-Difference Learning Constraint Handling Legged Locomotion Real-Time Control Value Function Approximation

Problem

Sampling-based control methods like MPPI suffer from computational costs that scale linearly with the planning horizon, limiting long-term reasoning, while constraint enforcement relies on brittle, handcrafted penalty functions that lack interpretability and scalability.

Approach

The method integrates a terminal value function learned offline via temporal-difference learning to approximate infinite-horizon costs, allowing shorter rollouts, and modulates trajectory discount factors based on constraint violations to replace traditional cost shaping.

Key results

Enables stable locomotion with significantly shorter MPC horizons (e.g., H=8 vs H≥10)
Provides a modular, interpretable mechanism for constraint-aware planning without penalty shaping
Reduces computational cost while maintaining or improving sample efficiency
Successfully transfers from simulation to real-world Solo12 quadruped hardware

Why it matters

Provides a practical, computationally efficient framework for real-time, constraint-aware locomotion control that bridges the gap between sampling-based optimization and learning-based long-horizon reasoning.

Abstract

Path Integral methods have demonstrated remark- able capabilities for solving non-linear stochastic optimal control problems through sampling-based optimization. However, their computational complexity grows linearly with the prediction horizon, limiting long-term reasoning, while constraints are merely enforced through handcrafted penalties. In this work, we propose a unified and efficient framework for enabling long- horizon reasoning and constraint enforcement within Model Predictive Path Integral (MPPI) control. First, we introduce a practical method to incorporate a terminal value function, learned offline via temporal-difference learning, to approximate the long-term cost-to-go. This allows for significantly shorter roll- outs while enabling infinite-horizon reasoning, thereby improv- ing computational efficiency and motion performance. Second, we propose a discount modulation strategy that adjusts the return of sampled trajectories based on constraint violations. This provides a more interpretable and effective mechanism for enforcing constraints compared to traditional cost shaping. Our formulation retains the flexibility and sampling efficiency of MPPI while supporting structured integration of long-term objectives and constraint handling. We validate our approach on both simulated and real-world robotic locomotion tasks, demonstrating improved performance, constraint-awareness, and generalization under reduced computational budgets.

Index terms

Optimization and Optimal Control Legged Robots Whole-Body Motion Planning and Control