CoTaP: Compliant Task Pipeline and Reinforcement Learning of Its Controller with Compliance Modulation
Zewen He, Chen Chenyuan, Dilshod Azizov, Yoshihiko Nakamura
AI summary
Problem
Current learning-based humanoid control relies on position-based methods and lacks sufficient force data, making it difficult to achieve safe and stable compliance during real-world interactions.
Approach
The authors propose a two-stage dual-agent reinforcement learning framework that distills a base policy into a compliance controller, using Log-Euclidean interpolation on the symmetric positive definite manifold to modulate stiffness matrices while guaranteeing stability.
Key results
- Developed a two-stage dual-agent RL policy distillation framework separating upper-body compliance and lower-body PD control.
- Implemented stiffness matrix modulation on the SPD manifold to guarantee positive definiteness and system stability.
- Validated the approach in simulation and physical experiments, demonstrating effective compliance modulation under external disturbances.
- Achieved adjustable task-space compliance without retraining, outperforming baseline PD control in disturbance rejection.
Why it matters
Provides a stable, adaptable compliance control strategy for humanoid robots, advancing safe human-robot interaction and complex loco-manipulation tasks.
Abstract
Humanoid whole-body locomotion control is a critical approach for humanoid robots to leverage their in- herent advantages. Learning-based control methods derived from retargeted human motion data provide an effective means of addressing this issue. However, because most current hu- man datasets lack measured force data, and learning-based robot control is largely position-based, achieving appropriate compliance during interaction with real environments remains challenging. This paper presents Compliant Task Pipeline (CoTaP): a pipeline that leverages compliance information in the learning-based structure of humanoid robots. A two- stage dual-agent reinforcement learning framework combined with model-based compliance control for humanoid robots is proposed. In the training process, first a base policy with a position-based controller is trained; then in the distillation, the upper-body policy is combined with model-based compliance control, and the lower-body agent is guided by the base policy. In the upper-body control, adjustable task-space compliance can be specified and integrated with other controllers through compliance modulation on the symmetric positive definite (SPD) manifold, ensuring system stability. We validated the feasibility of the proposed strategy in simulation and experiment, pri- marily comparing the responses to external disturbances under different compliance settings.