RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
Seungchan Kim, Omar Alama, Dmytro Kurdydyk, John Keller, Nikhil Varma Keetha, Wenshan Wang, Yonatan Bisk, Sebastian Scherer
AI summary
Problem
Outdoor semantic navigation for aerial robots struggles with large spatial scales, sparse object distribution, and unstructured layouts, causing existing reactive or offline methods to fail during long-range search.
Approach
The framework maintains a task-agnostic 3D voxel-ray map as persistent memory and uses a behavior tree to dynamically switch between short-range voxel search, long-range ray search, LVLM-guided cue generation, and frontier exploration.
Key results
- Outperforms baselines by 85.25% across 100 simulation tasks
- Enables multi-class search and on-the-fly task switching
- Demonstrates successful real-world deployment on an aerial robot
- Supports long-range reasoning without offline graph construction
Why it matters
Provides a scalable, resilient navigation framework for autonomous aerial search and inspection in complex, unstructured outdoor environments.
Abstract
Aerial outdoor semantic navigation requires robots to explore large, unstructured environments to locate target objects. Recent advances in semantic navigation have demon- strated open-set object-goal navigation in indoor settings, but these methods remain limited by constrained spatial ranges and structured layouts, making them unsuitable for long-range outdoor search. While outdoor semantic navigation approaches exist, they either rely on reactive policies based on current observations, which tend to produce short-sighted behaviors, or precompute scene graphs offline for navigation, limiting adapt- ability to online deployment. We present RAVEN, a 3D memory- based, behavior tree framework for aerial semantic navigation in unstructured outdoor environments. It (1) uses a spatially consistent semantic voxel-ray map as persistent memory, enabling long-horizon planning and avoiding purely reactive behaviors, (2) combines short-range voxel search and long-range ray search to scale to large environments, (3) leverages a large vision-language model to suggest auxiliary cues, mitigating sparsity of outdoor targets. These components are coordinated by a behavior tree, which adaptively switches behaviors for robust operation. We 1 Authors are with Carnegie Mellon University, Pittsburgh, PA, USA. {seungch2, oalama, jkeller2, nkeetha, wenshanw, ybisk, basti}@andrew.cmu.edu 2 Author is with Davidson College, Davidson, NC, USA. dmkurdydyk@davidson.edu evaluate RAVEN in 10 photorealistic outdoor simulation envi- ronments over 100 semantic tasks, encompassing single-object search, multi-class, multi-instance navigation and sequential task changes. Results show RAVEN outperforms baselines by 85.25% in simulation and demonstrate its real-world applicability through deployment on an aerial robot in outdoor field tests. Website: https://raven-semantic.github.io/