← Back ICRA 2026

SIGN: Safety-Aware Image-Goal Navigation for Autonomous Drones Via Reinforcement Learning

Zichen Yan, Rui Huang, Lei He, Shao Guo, Lin Zhao

PDF

AI summary

Key figure (auto-extracted from paper)

SIGN enables safe, map-less image-goal navigation for drones using only visual inputs and continuous reinforcement learning, with direct sim-to-real transfer.

Image-goal navigation autonomous drones reinforcement learning sim-to-real transfer collision avoidance continuous control

Problem

Existing image-goal navigation methods focus on ground robots or use discrete actions, failing to address the high-frequency control, localization drift, and safety requirements needed for autonomous drone flight in unknown environments.

Approach

SIGN trains an end-to-end continuous velocity policy using self-supervised auxiliary tasks for faster learning, while integrating a decoupled depth-based safety module that predicts collision risk and corrects unsafe actions in real time.

Key results

State-of-the-art Gibson benchmark performance under continuous control
Zero-shot sim-to-real transfer to physical drones without fine-tuning
Real-time obstacle avoidance via depth-based collision prediction and action correction
Map-less navigation using only RGB and IMU inputs

Why it matters

It provides a practical, low-cost navigation framework for drones operating in GPS-denied or unmapped settings like disaster response and industrial inspection.

Abstract

Image-goal navigation (ImageNav) tasks a robot with autonomously exploring an unknown environment and reaching a location that visually matches a given target image. While prior works primarily study ImageNav for ground robots, enabling this capability for autonomous drones is substantially more challenging due to their need for high-frequency feedback control and global localization for stable flight. In this paper, we propose a novel sim-to-real framework that leverages reinforce- ment learning (RL) to achieve ImageNav for drones. To enhance visual representation ability, our approach trains the vision backbone with auxiliary tasks, including image perturbations and future transition prediction, which results in more effective policy training. The proposed algorithm enables end-to-end ImageNav with direct velocity control, eliminating the need for external localization. Furthermore, we integrate a depth-based safety module for real-time obstacle avoidance, allowing the drone to safely navigate in cluttered environments. Unlike most existing drone navigation methods that focus solely on reference tracking or obstacle avoidance, our framework supports comprehensive navigation behaviors, including autonomous exploration, obstacle avoidance, and image-goal seeking, without requiring explicit global mapping. Code and model checkpoints are available at https://github.com/Zichen-Yan/SIGN.

Index terms

Vision-Based Navigation Reinforcement Learning Aerial Systems: Perception and Autonomy