Safe and Efficient Quadrupedal Locomotion with a Chambolle-Pock Whole-Body Controller
Xu Yang, Run Wang, Yiwen Lu, Yilin Mo
AI summary
Problem
Existing quadrupedal control strategies struggle to balance the strict safety guarantees of model-based optimization with the scalable, robust policy learning of reinforcement learning, primarily due to the computational bottleneck of running optimization solvers in thousands of parallel simulation environments.
Approach
The authors propose a hierarchical architecture where a high-level RL policy generates reference trajectories and a low-level whole-body controller enforces hard constraints using a novel Chambolle–Pock convex QP solver, designed for massive GPU parallelization during training and fast CPU execution during deployment.
Key results
- Hierarchical RL-OC framework bridging robust policy learning with strict constraint satisfaction
- Chambolle–Pock-based convex QP solver enabling massive GPU parallelization and real-time CPU deployment
- Quantifiable improvements in energy efficiency, tracking accuracy, and constraint satisfaction across simulations and hardware
- Simplified policy learning by offloading constraint handling to the WBC, enabling cross-platform transferability
Why it matters
Provides a scalable, safety-guaranteed control paradigm for researchers and engineers developing agile, energy-efficient legged robots for real-world deployment.
Abstract
This article presents a hierarchical control frame- work for quadrupedal locomotion that unifies the complementary strengths of model-based optimization and reinforcement learn- ing. We develop a convex quadratic programming (QP) solver based on the primal-dual Chambolle–Pock algorithm, enabling both massively parallel policy training and real-time deployment through efficient handling of constrained optimization problems. Our hierarchical framework employs learned policies for robust high-level control to handle real-world perturbations, while en- suring instantaneous constraint satisfaction and energy efficiency through a low-level whole-body controller powered by the pro- posed solver. Extensive benchmarks and experimental validation demonstrate quantifiable improvements in energy consumption, constraint satisfaction, and task transferability across simulated and real-world environments.