← Back ICRA 2026

PACE: Physics Augmentation for Coordinated End-to-end Reinforcement Learning toward Versatile Humanoid Table Tennis

Muqun Hu, Wenxi Chen, Wenjing Li, Falak Mandali, Zijian He, Renhong Zhang, Praveen Krisna, Katherine Christian, Leo Benaharon, Dizhi Ma, Karthik Ramani, Yan Gu

PDF

AI summary

Key figure (auto-extracted from paper)

A physics-augmented end-to-end RL framework enables a humanoid robot to play versatile table tennis with high success rates and coordinated whole-body motion.

Humanoid robotics End-to-end reinforcement learning Table tennis Physics-augmented rewards Whole-body control Sim-to-real transfer

Problem

End-to-end reinforcement learning struggles to learn agile, coordinated whole-body control for fast-moving tasks like table tennis due to sparse rewards and high-dimensional action spaces.

Approach

The method combines a lightweight learned ball-trajectory predictor for proactive decision-making with dense, physics-based rewards to guide efficient exploration and training.

Key results

Achieves ≥96% hit rate and ≥92% success rate across varied serves in simulation
Ablation studies confirm the predictor and physics-guided rewards are essential for learning
Successfully deployed zero-shot on a physical 23-DoF humanoid with coordinated footwork and fast returns
Open-sourced RL training code for reproducible research

Why it matters

Provides a scalable, practical framework for training humanoids in dynamic, real-world interactive tasks beyond static or free-space manipulation.

Abstract

Humanoid table tennis (TT) demands rapid per- ception, proactive whole-body motion, and agile footwork under strict timing—capabilities that remain difficult for end-to- end control policies. We propose a reinforcement learning (RL) framework that maps ball-position observations directly to whole-body joint commands for both arm striking and leg locomotion, strengthened by predictive signals and dense, physics-guided rewards. A lightweight learned predictor, fed with recent ball positions, estimates future ball states and augments the policy’s observations for proactive decision- making. During training, a physics-based predictor supplies precise future states to construct dense, informative rewards that lead to effective exploration. The resulting policy attains strong performance across varied serve ranges (hit rate≥96% and success rate≥92%) in simulations. Ablation studies confirm that both the learned predictor and the predictive reward design are critical for end-to-end learning. Deployed zero-shot on a physical Booster T1 humanoid with 23 revolute joints, the policy produces coordinated lateral and forward–backward footwork with accurate, fast returns, suggesting a practical path toward versatile, competitive humanoid TT. We have open-sourced our RL training code at: https://github.com/purdue-tracelab/TTRL- ICRA2026.

Index terms

Humanoid Robot Systems Whole-Body Motion Planning and Control Reinforcement Learning