Learning Collision-Free Object Goal Pushing for Quadruped Robots with Safe Corridors
Gabriel Lai, Yi Wong, Chung Yui Yeung, Shaohang Xu, Zhi Chen, Chin Pang Ho
AI summary
Problem
Existing reinforcement learning methods for quadrupedal pushing largely ignore obstacle avoidance in cluttered spaces, while traditional sensor-based avoidance requires computationally expensive training pipelines.
Approach
The authors integrate a low-dimensional safe corridor into the RL observation and constraint design, allowing the policy to learn collision-free pushing efficiently without processing high-dimensional LiDAR or camera data.
Key results
- Novel constrained RL framework for collision-free object pushing in cluttered environments
- Low-dimensional safe corridor representation replaces high-dimensional sensor inputs for efficient training
- Policy trains in approximately one hour on a consumer GPU and transfers directly to a real Unitree Go2 robot
- Demonstrates robust sim-to-real performance across varied cluttered scenarios
Why it matters
Provides a computationally efficient pathway for legged robots to safely perform non-prehensile manipulation in real-world cluttered spaces.
Abstract
While recent advancements in reinforcement learning have enabled quadrupedal robots to perform non- prehensile manipulation tasks like pushing, existing methods have largely overlooked the critical challenge of obstacle avoid- ance. In this paper, we address this significant limitation by introducing a novel reinforcement learning (RL) framework that controls a quadrupedal robot to push large objects in cluttered, real-world environments. In particular, obstacle avoidance is integrated as a primary objective directly into the policy training process. To achieve this, we propose to represent the traversable space with a low-dimensional safe corridor, a method that is both computationally efficient and highly effective. This approach avoids the need for complex and resource-intensive training pipelines typically required for processing high-dimensional sensor data. We validate our policy through extensive experiments in both simulation and the real world. The implementation code will be released to benefit the research community.