← Back ICRA 2026

Vision-Guided Outdoor Flight and Obstacle Evasion Via Reinforcement Learning

Shiladitya Dutta, Aayush Gupta, Varun Saran, Avideh Zakhor

PDF

AI summary

Key figure (auto-extracted from paper)

A reinforcement learning policy using stereo vision and VIO enables zero-shot, collision-free navigation of commercial drones in unknown outdoor environments without drone-specific tuning.

Vision-based navigation Reinforcement learning Sim-to-real transfer Autonomous drones Obstacle evasion Stereo vision

Problem

Autonomous drone navigation in GNSS-denied, obstacle-rich environments remains challenging due to reliance on continuous pilot control or complex, platform-specific tuning in traditional methods.

Approach

The authors train an end-to-end sensorimotor policy that maps stereo depth and visual-inertial odometry to velocity commands using reinforcement and privileged learning, bridging the sim-to-real gap with domain randomization and reward shaping.

Key results

Successful zero-shot transfer to unseen outdoor environments and a commercial drone platform
Robust 650m field navigation overcoming sensor noise, crosswinds, and dynamics mismatch
Direct velocity command output compatible with off-the-shelf drone APIs without platform-specific tuning
Effective sim-to-real bridging via domain randomization and trajectory-aware reward shaping

Why it matters

Enables reliable, autonomous obstacle avoidance for commercial drones in GPS-denied or complex real-world settings using standard hardware.

Abstract

Although quadcopters boast impressive traversal capabilities enabled by their omnidirectional maneuverability, the need for continuous pilot control in complex environments impedes their application in GNSS and telemetry-denied sce- narios. To this end, we propose a novel sensorimotor policy that uses stereo-vision depth and visual-inertial odometry (VIO) to autonomously navigate through obstacles in an unknown environment to reach a goal point. The policy is comprised of a pre-trained autoencoder as the perception head followed by a planning and control LSTM network which outputs velocity com- mands that can be followed by an off-the-shelf commercial drone. We leverage reinforcement and privileged learning paradigms to train the policy in simulation through a two-stage process: 1) initial training with optimal trajectories generated by a global motion planner acting as a supervisory backbone, 2) further fine- tuning in a curriculum environment. To bridge the sim-to-real gap, we employ domain randomization and reward shaping to create a policy that is both robust to noise and domain shift. In outdoor experiments, our approach achieves successful zero-shot transfer to both obstacle environments and a drone platform that were never encountered during training.

Index terms

Aerial Systems: Perception and Autonomy Reinforcement Learning Vision-Based Navigation