Research Analyzer
← Back ICRA 2026

RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation

Seungchan Kim, Omar Alama, Dmytro Kurdydyk, John Keller, Nikhil Varma Keetha, Wenshan Wang, Yonatan Bisk, Sebastian Scherer

PDF

AI summary

Key figure (auto-extracted from paper)
RAVEN enables aerial robots to reliably locate distant or unseen targets in large outdoor environments by combining persistent 3D semantic memory with adaptive behavior switching.
Aerial navigation Semantic mapping Open-set recognition Behavior trees Vision-language models Outdoor robotics

Problem

Outdoor semantic navigation for aerial robots struggles with large spatial scales, sparse object distribution, and unstructured layouts, causing existing reactive or offline methods to fail during long-range search.

Approach

The framework maintains a task-agnostic 3D voxel-ray map as persistent memory and uses a behavior tree to dynamically switch between short-range voxel search, long-range ray search, LVLM-guided cue generation, and frontier exploration.

Key results

  • Outperforms baselines by 85.25% across 100 simulation tasks
  • Enables multi-class search and on-the-fly task switching
  • Demonstrates successful real-world deployment on an aerial robot
  • Supports long-range reasoning without offline graph construction

Why it matters

Provides a scalable, resilient navigation framework for autonomous aerial search and inspection in complex, unstructured outdoor environments.

Abstract

Aerial outdoor semantic navigation requires robots to explore large, unstructured environments to locate target objects. Recent advances in semantic navigation have demon- strated open-set object-goal navigation in indoor settings, but these methods remain limited by constrained spatial ranges and structured layouts, making them unsuitable for long-range outdoor search. While outdoor semantic navigation approaches exist, they either rely on reactive policies based on current observations, which tend to produce short-sighted behaviors, or precompute scene graphs offline for navigation, limiting adapt- ability to online deployment. We present RAVEN, a 3D memory- based, behavior tree framework for aerial semantic navigation in unstructured outdoor environments. It (1) uses a spatially consistent semantic voxel-ray map as persistent memory, enabling long-horizon planning and avoiding purely reactive behaviors, (2) combines short-range voxel search and long-range ray search to scale to large environments, (3) leverages a large vision-language model to suggest auxiliary cues, mitigating sparsity of outdoor targets. These components are coordinated by a behavior tree, which adaptively switches behaviors for robust operation. We 1 Authors are with Carnegie Mellon University, Pittsburgh, PA, USA. {seungch2, oalama, jkeller2, nkeetha, wenshanw, ybisk, basti}@andrew.cmu.edu 2 Author is with Davidson College, Davidson, NC, USA. dmkurdydyk@davidson.edu evaluate RAVEN in 10 photorealistic outdoor simulation envi- ronments over 100 semantic tasks, encompassing single-object search, multi-class, multi-instance navigation and sequential task changes. Results show RAVEN outperforms baselines by 85.25% in simulation and demonstrate its real-world applicability through deployment on an aerial robot in outdoor field tests. Website: https://raven-semantic.github.io/

Index terms

Aerial Systems: Perception and Autonomy Semantic Scene Understanding Vision-Based Navigation

Related papers