← Back ICRA 2026

Safe and Optimal Variable Impedance Control Via Certified Reinforcement Learning

Shreyas Kumar, Ravi Prakash

PDF

AI summary

Key figure (auto-extracted from paper)

C-GMS guarantees Lyapunov stability and actuator feasibility during reinforcement learning exploration, enabling safe and optimal variable impedance control without post-hoc safety filters.

Variable Impedance Control Reinforcement Learning Lyapunov Stability Certified Control Human-Robot Interaction Gaussian-Manifold Sampling

Problem

Model-free reinforcement learning for variable impedance control often risks instability and unsafe exploration due to time-varying impedance gains, as existing methods lack built-in stability and actuator-limit awareness during policy search.

Approach

The authors introduce Certified Gaussian-Manifold Sampling (C-GMS), which restricts policy exploration to a mathematically defined manifold of stable gain schedules, ensuring every learned policy rollout is provably stable and physically realizable by construction.

Key results

Guarantees Lyapunov stability and actuator feasibility during RL exploration without reward penalties or post-hoc filters
Provides a theoretical proof of uniformly ultimately bounded tracking error under model uncertainties
Demonstrates successful simulation and real-robot execution for a collaborative human-robot handover task
Prevents instability and collisions that occur with unconstrained policy sampling

Why it matters

Enables reliable, safe, and optimal physical interaction for robots operating in unstructured, dynamic environments.

Abstract

Reinforcement learning (RL) offers a powerful approach for robots to learn complex, collaborative skills by combining Dynamic Movement Primitives (DMPs) for motion and Variable Impedance Control (VIC) for compliant interac- tion. However, this model-free paradigm often risks instability and unsafe exploration due to the time-varying nature of impedance gains. This work introduces Certified Gaussian- Manifold Sampling (C-GMS), a novel trajectory-centric RL framework that learns combined DMP and VIC policies while guaranteeing Lyapunov stability and actuator feasibility by construction. Our approach reframes policy exploration as sampling from a mathematically defined manifold of stable gain schedules. This ensures every policy rollout is guaranteed to be stable and physically realizable, thereby eliminating the need for reward penalties or post-hoc validation. Furthermore, we provide a theoretical guarantee that our approach ensures bounded tracking error even in the presence of bounded model errors and deployment-time uncertainties. We demonstrate the effectiveness of C-GMS in simulation and verify its efficacy on a real robot, paving the way for reliable autonomous interaction in complex environments.

Index terms

Reinforcement Learning Compliance and Impedance Control Safety in HRI