← Back ICRA 2026

Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

Zifan Xu, Myoungkyu Seo, Dongmyeong Lee, Hao Fu, Jiaheng Hu, Jiaxun Cui, Yuqian Jiang, Zhihan Wang, Anastasiia Brund, Joydeep Biswas, Peter Stone

PDF

AI summary

Key figure (auto-extracted from paper)

A four-stage reinforcement learning pipeline enables humanoid robots to learn robust, continual ball-kicking skills that generalize across diverse configurations and deploy successfully in the real world despite noisy perception.

Humanoid robotics reinforcement learning visuomotor control sim-to-real transfer ball-kicking noisy perception

Problem

Humanoid robots struggle to learn agile, whole-body striking skills that require rapid leg swings, single-foot balance, and precise timing under noisy sensory input and external perturbations.

Approach

The authors propose a four-stage teacher-student reinforcement learning framework that trains a privileged teacher policy for ball chasing and kicking, distills it to a student policy under realistic noisy perception, and refines it using online constrained reinforcement learning.

Key results

66.7% real-world goal-scoring success rate across five configurations on a Booster T1 robot
Realistic imperfect perception model with velocity-dependent noise, delays, and occlusion
Online constrained RL adaptation that eliminates jittery motions and unsafe sharp turns
Four-stage teacher-student training pipeline establishing a new benchmark for humanoid striking

Why it matters

This work provides a practical framework for deploying agile, perception-robust whole-body control policies on real humanoid robots, advancing the feasibility of humanoid soccer and dynamic manipulation tasks.

Abstract

Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)–based training pipeline that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The pipeline extends a typical teacher-student training framework—in which a “teacher” policy is trained with ground truth state information and the “student” learns to mimic it with noisy, imperfect sensing—by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student), and (4) student adaptation and refine- ment (student). Key design elements—including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement—are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal- scoring success across diverse ball–goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a training pipeline for robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control. This work has taken place jointly in the Learning Agents Research Group (LARG) and Autonomous Mobile Robotics Laboratory (AMRL) at UT Austin. LARG research is supported in part by NSF (FAIN- 2019844, NRT-2125858), ONR (N00014-24-1-2550), ARO (W911NF-17- 2-0181, W911NF-23-2-0004, W911NF-25-1-0065), DARPA (Cooperative Agreement HR00112520004 on Ad Hoc Teamwork) Lockheed Martin, and UT Austin’s Good Systems grand challenge. AMRL research is supported in part by NSF (CAREER-2046955, OIA-2219236, DGE-2125858, CCF- 2319471), ARO (W911NF-23-2-0004), Amazon, and JP Morgan. Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors. Peter Stone serves as the Chief Scientist of Sony AI and receives financial compensation for that role. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.

Index terms

Humanoid and Bipedal Locomotion Sensorimotor Learning Reinforcement Learning