FALCO: Foundation Model Guided Active Learning for Cost-Effective Off-Road Freespace Detection
Shuai Wang, Chenxin Li, Yintong Chen, Yaobo Jia, Hongze Li, Chen Min, Jilin Mei, Huijing Zhao
AI summary
Problem
Annotating unstructured off-road environments is prohibitively expensive, and traditional active learning strategies fail to capture rare, safety-critical cases due to high environmental complexity and semantic ambiguity.
Approach
The method scores sample criticality by combining vision foundation model prediction deviation, model uncertainty, and vision-language model semantic evaluation, then applies a semantic grid-based sampling strategy to balance scene coverage with challenging case prioritization.
Key results
- Integrates vision foundation model deviation, uncertainty, and vision-language semantic scoring for reliable sample criticality assessment
- Introduces semantic vector generation and grid-based sampling to ensure broad scene coverage
- Achieves significant gains in low-percentile IoU on rare and difficult off-road scenarios compared to state-of-the-art baselines
- Maintains competitive overall performance while drastically reducing annotation costs under limited budgets
Why it matters
Provides a scalable, cost-effective solution for training robust autonomous navigation models in complex, unstructured off-road environments.
Abstract
Freespace detection in unstructured off-road en- vironments is critical for safe autonomous navigation but remains highly challenging due to ambiguous boundaries, diverse terrains, and long-tail safety-critical cases. Constructing large annotated datasets in such environments is prohibitively costly, which makes active learning essential to maximize model robustness under limited annotation budgets. However, conven- tional uncertainty or diversity-based strategies are unreliable in these complex settings, often failing to capture rare yet important scenarios. To address this, we propose FALCO, a foundation model guided active learning framework for cost- effective off-road freespace detection. FALCO integrates three complementary criteria: prediction deviation from a vision foundation model, model uncertainty, and semantic evalua- tion from a vision-language model to form a reliable sample criticality score. In addition, we introduce a semantic grid based sampling strategy that balances coverage across scene conditions while prioritizing challenging cases. Extensive ex- periments show that FALCO substantially improves robustness on rare and difficult scenarios, achieving significant gains in low-percentile IoU compared to state-of-the-art baselines, while maintaining competitive overall performance.