RPG: Robust Policy Gating for Smooth Multi-Skill Transitions in Humanoid Fighting
Xuelong Li, Junbo Tan, Dong Wang
AI summary
Problem
Existing imitation learning approaches for humanoid fighting suffer from instability and jerky motions when switching between skills due to mismatched state distributions and out-of-domain disturbances during transitions.
Approach
The framework trains separate expert policies for each fighting skill and applies policy-transition and temporal randomization during training to force robustness against abrupt switches. A lightweight gating network then blends these experts with smoothness regularization to produce fluid, stable multi-skill control.
Key results
- Policy-transition and temporal randomization improves robustness to abrupt skill switches
- Lightweight gating network enables smooth, real-time fusion of multiple expert policies
- Integrated locomotion and combat pipeline supports prolonged, game-like humanoid fighting
- Successful sim-to-real transfer on the Unitree G1 validates robust real-world execution
Why it matters
Advances practical whole-body control for humanoids, enabling reliable deployment of complex, dynamic multi-skill behaviors in interactive and real-world applications.
Abstract
Humanoid robots have demonstrated impressive motor skills in a wide range of tasks, yet whole-body control for humanlike long-time, dynamic fighting remains particularly challenging due to the stringent requirements on agility and stability. While imitation learning enables robots to execute human-like fighting skills, existing approaches often rely on switching among multiple single-skill policies or employing a general policy to imitate input reference motions. These strategies suffer from instability when transitioning between skills, as the mismatch of initial and terminal states across skills or reference motions introduces out-of-domain distur- bances, resulting in unsmooth or unstable behaviors. In this work, we propose RPG, a hybrid expert policy framework, for smooth and stable humanoid multi-skills transition. Our approach incorporates motion transition randomization and temporal randomization to train a unified policy that generates agile fighting actions with stability and smoothness during skill transitions. Furthermore, we design a control pipeline that integrates walking/running locomotion with fighting skills, allowing humanlike long-time combat of arbitrary duration that can be seamlessly interrupted or transit action policies at any time. Extensive experiments in simulation demonstrate the effectiveness of the proposed framework, and real-world deployment on the Unitree G1 humanoid robot further validates its robustness and applicability.