T2-Nav: Algebraic-Topology�Aware Temporal Graph Memory and Loop Detection for Zero-Shot Visual Navigation
Nguyen Duc Quang Anh, Pham Minh Duc, Minh Anh Nguyen, Duy Tung Doan, Tuan Dang
AI summary
Problem
Current training-free navigation methods lack temporal coherence and fail to detect complex navigation loops beyond simple geometric proximity, causing redundant exploration and inconsistent goal recognition in unseen environments.
Approach
T2-Nav integrates a Temporal Graph Memory Network to maintain cross-temporal object tracking with a topological loop detection module that uses persistent homology to identify and avoid cyclic exploration patterns without any learned parameters.
Key results
- Temporal Graph Memory Network for cross-temporal instance tracking
- Persistent homology-based loop closure detection
- Training-free zero-shot navigation to specific visual instances
- Reduced redundant exploration and improved path planning in unseen environments
Why it matters
It provides a training-free, universally applicable navigation framework that bridges the gap between laboratory benchmarks and real-world deployment for service robots and automation systems.
Abstract
Deploying autonomous agents in real-world environments is challenging, particularly for navigation, where systems must adapt to situations they haven’t encountered before. Traditional learning approaches require substantial amounts of data, constant tuning, and, sometimes, starting over for each new task. That makes them hard to scale and not very flexible. Recent breakthroughs in foundation models, such as large language models and vision-language models, enable systems to attempt new navigation tasks without requiring additional training. However, many of these methods only work with specific input types, employ relatively basic reasoning, and fail to fully exploit the details they observe or the structure of the spaces. Here, we introduce T2-Nav, a zero-shot navigation system that integrates heterogeneous data and employs graph-based reasoning. By directly incorporating visual information into the graph and matching it to the environment, our approach enables the system to strike a good balance between exploration and goal attainment. This strategy allows robust obstacle avoidance, reliable loop closure detection, and efficient path planning while eliminating redundant exploration patterns. The system demonstrates flexibility by handling goals specified using reference images of target object instances, making it particularly suitable for scenarios in which agents must navigate to visually similar yet spatially distinct instances. Experiments demonstrate that our approach is efficient and adapts well to unknown environments, moving toward practical zero-shot instance-image navigation capabilities. Our source code is available at https://github.com/cogniboticslab/t2nav.