← Back ICRA 2026

Synthetic vs. Real Training Data for Visual Navigation

Lauri Aleksanteri Suomela, Sasanka Kuruppu Arachchige, German F. Torres, Harry Edelman, Joni-Kristian Kamarainen

PDF

AI summary

Key figure (auto-extracted from paper)

Policies trained entirely in simulation can outperform those trained on real-world data for visual navigation when leveraging diverse pretraining and on-policy learning.

visual navigation sim-to-real transfer synthetic training data on-policy learning transformer policies robot learning

Problem

It remains unclear whether navigation policies trained entirely in simulation can match or exceed those trained on real-world data, despite the well-known sim-to-real performance gap.

Approach

The authors introduce FAINT, a lightweight transformer-based navigation policy that uses frozen pretrained visual features and a binocular encoder to bridge appearance differences between simulation and reality, enabling direct comparison of simulation-only versus real-world training.

Key results

Simulation-trained FAINT outperforms real-world-trained counterpart by 31 points in success rate
Surpasses prior state-of-the-art methods by 50 points in navigation success rate
Successfully generalizes to unseen environments and different robot platforms like drones
Identifies on-policy learning as a critical advantage of simulation over real data

Why it matters

It demonstrates that scalable simulation training, combined with robust visual representations and on-policy learning, can surpass real-world data collection for robot navigation, guiding future robot learning strategies.

Abstract

This paper investigates how the performance of visual navigation policies trained in simulation compares to policies trained with real-world data. Performance degradation of simulator-trained policies is often significant when they are evaluated in the real world. However, despite this well- known sim-to-real gap, we demonstrate that simulator-trained policies can match the performance of their real-world-trained counterparts. Central to our approach is a navigation policy architecture that bridges the sim-to-real appearance gap by leveraging pretrained visual representations and runs real-time on robot hardware. Evaluations on a wheeled mobile robot show that the proposed policy, when trained in simulation, outper- forms its real-world-trained version by 31 and the prior state-of- the-art methods by 50 points in navigation success rate. Policy generalization is verified by deploying the same model onboard a drone. Our results highlight the importance of diverse image encoder pretraining for sim-to-real generalization, and identify on-policy learning as a key advantage of simulated training over training with real data. Code, model checkpoints and multimedia materials are available at lasuomela.github.io/faint.

Index terms

Vision-Based Navigation Imitation Learning Data Sets for Robot Learning