Research Analyzer
← Back ICRA 2026

Quadrotor Navigation Using Reinforcement Learning with Privileged Information

Jonathan Lee, Abhishek Rathod, Kshitij Goel, John Stecklein, Wennie Tabib

PDF

AI summary

Key figure (auto-extracted from paper)
The proposed method achieves an 86% success rate in navigating large obstacles, outperforming baselines by 34%, by training with privileged time-of-arrival maps and a yaw alignment loss.
Reinforcement Learning Quadrotor Navigation Privileged Information Time-of-Arrival Maps Differentiable Simulation Sim-to-Real Transfer

Problem

Prior learning-based navigation methods struggle to find paths in environments with large obstacles, sharp corners, and dead ends, often getting stuck or colliding due to fixed headings or lack of global path guidance.

Approach

The authors train an end-to-end reinforcement learning policy using differentiable simulation, privileged time-of-arrival maps for shortest-path guidance, and a novel yaw alignment loss to enable reorientation around large obstacles without requiring the map at test time.

Key results

  • 86% simulation success rate, outperforming baselines by 34%
  • Zero-collision hardware deployment across 20 outdoor flights covering 589 m at 4 m/s
  • Novel yaw alignment loss enables effective navigation through sharp corners and twisting passageways
  • Body rate attitude control and domain randomization successfully bridge the sim-to-real gap

Why it matters

Enables reliable, high-speed autonomous flight for lightweight quadrotors in complex, cluttered environments where traditional and prior learning-based methods fail.

Abstract

This paper presents a reinforcement learning- based quadrotor navigation method that leverages efficient differentiable simulation, novel loss functions, and privileged information to navigate around large obstacles. Prior learning- based methods perform well in scenes that exhibit narrow obstacles, but struggle when the goal location is blocked by large walls or terrain. In contrast, the proposed method utilizes time-of-arrival (ToA) maps as privileged information and a yaw alignment loss to guide the robot around large obstacles. The policy is evaluated in photo-realistic simulation environments containing large obstacles, sharp corners, and dead-ends. Our approach achieves an 86% success rate and outperforms baseline strategies by 34%. We deploy the policy onboard a custom quadrotor in outdoor cluttered environments both during the day and night. The policy is validated across 20 flights, covering 589 m without collisions at speeds up to 4 m/s.

Index terms

Field Robots Aerial Systems: Applications

Related papers