← Back ICRA 2026

OVerSeeC: Open-Vocabulary Costmap Generation from Satellite Images and Natural Language

Rwik Rana, Jesse Quattrociocchi, Dongmyeong Lee, Christian Ellis, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas

PDF

AI summary

Key figure (auto-extracted from paper)

A modular, zero-shot framework successfully generates preference-aligned costmaps from satellite imagery and natural language without retraining, handling novel terrain and complex traversal rules.

open-vocabulary segmentation costmap generation natural language navigation zero-shot planning autonomous vehicles

Problem

Traditional costmap generation relies on fixed ontologies and static cost mappings, making it impossible to adapt to mission-specific preferences, unseen terrain classes, or compositional rules expressed in natural language.

Approach

OVERSEEC decomposes the task into three zero-shot stages: an LLM extracts entities and ranked preferences from a prompt, an open-vocabulary segmentation pipeline locates these entities in high-resolution satellite imagery, and a second LLM synthesizes executable code to compose the final costmap.

Key results

Zero-shot open-vocabulary segmentation pipeline preserving native satellite image resolution
LLM-synthesized executable costmap functions that accurately capture compositional and spatial preferences
Interactive GUI enabling rapid, annotation-free costmap iteration via natural language
Ranked Regret Path Integral (RRPI) metric for quantifying path-preference alignment

Why it matters

Enables scalable, mission-adaptive global planning for autonomous vehicles by allowing operators to dynamically specify traversal rules without model retraining or manual annotation.

Abstract

Aerial imagery provides essential global context for autonomous navigation, enabling route planning at scales inaccessible to onboard sensing. We address the problem of gen- erating global costmaps for long-range planning directly from satellite imagery when entities and mission-specific traversal rules are expressed in natural language at test time. This setting is challenging since mission requirements vary, terrain entities may be unknown at deployment, and user prompts often encode compositional traversal logic. Existing approaches relying on fixed ontologies and static cost mappings cannot accommodate such flexibility. While foundation models excel at language interpretation and open-vocabulary perception, no single model can simultaneously parse nuanced mission directives, locate arbitrary entities in large-scale imagery, and synthesize them into an executable cost function for planners. We therefore propose OVERSEEC, a zero-shot modular framework that decomposes the problem into Interpret–Locate–Synthesize: (i) an LLM extracts entities and ranked preferences, (ii) an open-vocabulary segmentation pipeline identifies these entities from high-resolution imagery, and (iii) the LLM uses the user’s natural language preferences and masks to synthesize executable costmap code. Empirically, OVERSEEC handles novel entities, respects ranked and compositional preferences, and produces routes consistent with human-drawn trajectories across diverse regions, demonstrating robustness to distribution shifts. This shows that modular composition of foundation models enables open-vocabulary, preference-aligned costmap generation for scalable, mission-adaptive global planning. Website: https://amrl.cs.utexas.edu/overseec/

Index terms

Motion and Path Planning Semantic Scene Understanding Field Robots