Safe and Optimal Variable Impedance Control Via Certified Reinforcement Learning
Shreyas Kumar, Ravi Prakash
AI summary
Problem
Model-free reinforcement learning for variable impedance control often risks instability and unsafe exploration due to time-varying impedance gains, as existing methods lack built-in stability and actuator-limit awareness during policy search.
Approach
The authors introduce Certified Gaussian-Manifold Sampling (C-GMS), which restricts policy exploration to a mathematically defined manifold of stable gain schedules, ensuring every learned policy rollout is provably stable and physically realizable by construction.
Key results
- Guarantees Lyapunov stability and actuator feasibility during RL exploration without reward penalties or post-hoc filters
- Provides a theoretical proof of uniformly ultimately bounded tracking error under model uncertainties
- Demonstrates successful simulation and real-robot execution for a collaborative human-robot handover task
- Prevents instability and collisions that occur with unconstrained policy sampling
Why it matters
Enables reliable, safe, and optimal physical interaction for robots operating in unstructured, dynamic environments.
Abstract
Reinforcement learning (RL) offers a powerful approach for robots to learn complex, collaborative skills by combining Dynamic Movement Primitives (DMPs) for motion and Variable Impedance Control (VIC) for compliant interac- tion. However, this model-free paradigm often risks instability and unsafe exploration due to the time-varying nature of impedance gains. This work introduces Certified Gaussian- Manifold Sampling (C-GMS), a novel trajectory-centric RL framework that learns combined DMP and VIC policies while guaranteeing Lyapunov stability and actuator feasibility by construction. Our approach reframes policy exploration as sampling from a mathematically defined manifold of stable gain schedules. This ensures every policy rollout is guaranteed to be stable and physically realizable, thereby eliminating the need for reward penalties or post-hoc validation. Furthermore, we provide a theoretical guarantee that our approach ensures bounded tracking error even in the presence of bounded model errors and deployment-time uncertainties. We demonstrate the effectiveness of C-GMS in simulation and verify its efficacy on a real robot, paving the way for reliable autonomous interaction in complex environments.