← Back ICRA 2026

Non-Submodular Visual Attention for Robot Navigation

Reza Vafaee, Kian Behzad, Milad Siami, Luca Carlone, Ali Jadbabaie

PDF

AI summary

Key figure (auto-extracted from paper)

A real-time visual attention framework that selects informative features via non-submodular MSE optimization, enabling accurate robot navigation under strict computational constraints.

Visual feature selection non-submodular optimization visual-inertial navigation greedy algorithms real-time robotics mean squared error

Problem

Robot navigation systems struggle with the computational cost of selecting visual features for pose estimation in dynamic environments. Existing methods either assume submodularity (which doesn't apply to MSE-based objectives) or rely on expensive convex relaxations that hinder real-time deployment.

Approach

The authors frame feature selection as maximizing a non-submodular objective function directly tied to mean squared estimation error. They introduce four polynomial-time approximation algorithms—greedy, low-rank greedy, randomized greedy, and linearization-based—backed by rigorous performance bounds to efficiently prioritize informative visual features.

Key results

Formulated visual feature selection as maximizing a monotone non-submodular MSE objective
Derived constant-factor approximation bounds for greedy selection using submodularity ratio and curvature
Developed low-rank and randomized greedy variants that drastically reduce computational complexity
Validated on EuRoC benchmarks and a QCar platform, achieving real-time deployment with strong approximation guarantees

Why it matters

Provides resource-constrained robots with a theoretically grounded, real-time feature selection method that maintains high pose estimation accuracy without prohibitive computational costs.

Abstract

This paper presents a task-oriented computational framework to enhance Visual-Inertial Navigation (VIN) in robots, addressing challenges such as limited time and energy resources. The framework strategically selects visual features using a Mean Squared Error (MSE)-based, non-submodular objective function and a simplified dynamic anticipation model. To address the NP-hardness of this problem, we introduce four polynomial- time approximation algorithms: a classic greedy method with constant-factor guarantees; a low-rank greedy variant that significantly reduces computational complexity; a randomized greedy sampler that balances efficiency and solution quality; and a linearization-based selector based on a first-order Taylor expansion for near-constant-time execution. We establish rigorous performance bounds by leveraging submodularity ratios, curva- ture, and element-wise curvature analyses. Extensive experiments on both standardized benchmarks and a custom control-aware platform validate our theoretical results, demonstrating that these methods achieve strong approximation guarantees while enabling real-time deployment.

Index terms

Visual-Based Navigation Autonomous Vehicle Navigation Localization Optimization and Optimal Control