Quadrotor Navigation Using Reinforcement Learning with Privileged Information
Jonathan Lee, Abhishek Rathod, Kshitij Goel, John Stecklein, Wennie Tabib
AI summary
Problem
Prior learning-based navigation methods struggle to find paths in environments with large obstacles, sharp corners, and dead ends, often getting stuck or colliding due to fixed headings or lack of global path guidance.
Approach
The authors train an end-to-end reinforcement learning policy using differentiable simulation, privileged time-of-arrival maps for shortest-path guidance, and a novel yaw alignment loss to enable reorientation around large obstacles without requiring the map at test time.
Key results
- 86% simulation success rate, outperforming baselines by 34%
- Zero-collision hardware deployment across 20 outdoor flights covering 589 m at 4 m/s
- Novel yaw alignment loss enables effective navigation through sharp corners and twisting passageways
- Body rate attitude control and domain randomization successfully bridge the sim-to-real gap
Why it matters
Enables reliable, high-speed autonomous flight for lightweight quadrotors in complex, cluttered environments where traditional and prior learning-based methods fail.
Abstract
This paper presents a reinforcement learning- based quadrotor navigation method that leverages efficient differentiable simulation, novel loss functions, and privileged information to navigate around large obstacles. Prior learning- based methods perform well in scenes that exhibit narrow obstacles, but struggle when the goal location is blocked by large walls or terrain. In contrast, the proposed method utilizes time-of-arrival (ToA) maps as privileged information and a yaw alignment loss to guide the robot around large obstacles. The policy is evaluated in photo-realistic simulation environments containing large obstacles, sharp corners, and dead-ends. Our approach achieves an 86% success rate and outperforms baseline strategies by 34%. We deploy the policy onboard a custom quadrotor in outdoor cluttered environments both during the day and night. The policy is validated across 20 flights, covering 589 m without collisions at speeds up to 4 m/s.