← Back ICRA 2024

Pre-Trained Masked Image Model for Mobile Robot Navigation

Vishnu D. Sharma, Anukriti Singh, Pratap Tokekar

PDF

Abstract

2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas. Typically, the robot builds the navigation maps incremen- tally from local observations using onboard sensors. Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency. While many such works build task-specific networks using limited datasets, we show that the existing foundational vision networks can accomplish the same without any fine-tuning. Specifically, we use Masked Autoencoders, pre-trained on street images, to present novel applications for field-of-view expansion, single-agent topological exploration, and multi-agent exploration for indoor mapping, across different input modalities. Our work motivates the use of foundational vision models for generalized structure prediction- driven applications, especially in the dearth of training data. We share more qualitative results at https://raaslab.org/ projects/MIM4Robots.

Index terms

Representation Learning Deep Learning for Visual Perception Deep Learning Methods