← Back ICRA 2026

What Matters in Learning a Zero-Shot Sim-To-Real RL Policy for Quadrotor Control? a Comprehensive Study

Jiayu Chen, Chao Yu, Yuqing Xie, Feng Gao, Yinuo Chen, Shu'ang Yu, Wenhao Tang, Shilong Ji, Mo Mu, Yi Wu, Huazhong Yang, Yu Wang

PDF

AI summary

Key figure (auto-extracted from paper)

Identifies five critical training factors that enable robust, zero-shot sim-to-real reinforcement learning policies for quadrotor control, reducing tracking error by over 50% compared to baselines.

Reinforcement learning sim-to-real transfer quadrotor control zero-shot deployment system identification domain randomization

Problem

RL policies for quadrotors often fail during real-world deployment due to the sim-to-real gap, with no unified understanding of which training factors most critically impact zero-shot transfer success.

Approach

Systematically evaluates input design, reward shaping, system identification, domain randomization, and batch size to develop SimpleFlight, a PPO-based framework that integrates these optimized components for direct sim-to-real deployment.

Key results

Over 50% reduction in trajectory tracking error versus SOTA RL baselines
Successful zero-shot deployment on smooth and infeasible real-world trajectories
Identification of system identification and selective domain randomization as critical factors
Demonstration that large training batch sizes improve real-world generalization

Why it matters

Provides a reproducible, open-source framework and actionable guidelines for researchers and engineers developing robust RL controllers for aerial robotics.

Abstract

Precise and agile flight maneuvers are essential for quadrotor applications, yet traditional control methods are limited by their reliance on flat trajectories or computationally intensive optimization. Reinforcement learning (RL)-based policies offer a promising alternative by directly mapping observations to ac- tions, reducing dependency on system knowledge and actuation constraints. However, the sim-to-real gap remains a significant challenge, often causing instability in real-world deployments. In this work, we identify five key factors for learning robust RL- based control policies capable of zero-shot real-world deployment: (1) integrating velocity and rotation matrix into actor inputs, (2) incorporating time vector into critic inputs, (3) regularizing ac- tion differences for smoothness, (4) applying system identification with selective randomization, and (5) using large batch sizes dur- ing training. Based on these insights, we develop SimpleFlight, a PPO-based framework that integrates these techniques. Extensive experiments on the Crazyflie quadrotor demonstrate that Simple- Flight reduces trajectory tracking error by over 50% compared to state-of-the-art RL baselines. It excels in both smooth polynomial and challenging infeasible zigzag trajectories, particularly on small thrust-to-weight quadrotors, where baseline methods often fail. To enhance reproducibility and further research, we integrate Sim- pleFlight into the GPU-based Omnidrones simulator and provide open-source code and model checkpoints.

Index terms

Reinforcement Learning Machine Learning for Robot Control Aerial Systems: Applications