What Matters in Learning a Zero-Shot Sim-To-Real RL Policy for Quadrotor Control? a Comprehensive Study
Jiayu Chen, Chao Yu, Yuqing Xie, Feng Gao, Yinuo Chen, Shu'ang Yu, Wenhao Tang, Shilong Ji, Mo Mu, Yi Wu, Huazhong Yang, Yu Wang
AI summary
Problem
RL policies for quadrotors often fail during real-world deployment due to the sim-to-real gap, with no unified understanding of which training factors most critically impact zero-shot transfer success.
Approach
Systematically evaluates input design, reward shaping, system identification, domain randomization, and batch size to develop SimpleFlight, a PPO-based framework that integrates these optimized components for direct sim-to-real deployment.
Key results
- Over 50% reduction in trajectory tracking error versus SOTA RL baselines
- Successful zero-shot deployment on smooth and infeasible real-world trajectories
- Identification of system identification and selective domain randomization as critical factors
- Demonstration that large training batch sizes improve real-world generalization
Why it matters
Provides a reproducible, open-source framework and actionable guidelines for researchers and engineers developing robust RL controllers for aerial robotics.
Abstract
Precise and agile flight maneuvers are essential for quadrotor applications, yet traditional control methods are limited by their reliance on flat trajectories or computationally intensive optimization. Reinforcement learning (RL)-based policies offer a promising alternative by directly mapping observations to ac- tions, reducing dependency on system knowledge and actuation constraints. However, the sim-to-real gap remains a significant challenge, often causing instability in real-world deployments. In this work, we identify five key factors for learning robust RL- based control policies capable of zero-shot real-world deployment: (1) integrating velocity and rotation matrix into actor inputs, (2) incorporating time vector into critic inputs, (3) regularizing ac- tion differences for smoothness, (4) applying system identification with selective randomization, and (5) using large batch sizes dur- ing training. Based on these insights, we develop SimpleFlight, a PPO-based framework that integrates these techniques. Extensive experiments on the Crazyflie quadrotor demonstrate that Simple- Flight reduces trajectory tracking error by over 50% compared to state-of-the-art RL baselines. It excels in both smooth polynomial and challenging infeasible zigzag trajectories, particularly on small thrust-to-weight quadrotors, where baseline methods often fail. To enhance reproducibility and further research, we integrate Sim- pleFlight into the GPU-based Omnidrones simulator and provide open-source code and model checkpoints.