Research Analyzer
← Back IROS 2024

Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction

Changan Chen Chen, Jordi Ramos Chen, Anshul Tomar, Kristen Grauman

PDF

Abstract

Sim2real transfer has received increasing attention lately due to its success in transferring robotic policies learned in simulation to the real world. While significant progress has been made in transferring vision-based navigation policies, the current sim2real strategy for audio-visual navigation remains limited to basic data augmentation. Sound differs from light in that it spans across much wider frequencies and thus requires a different solution for sim2real. To understand how the acoustic sim2real gap varies with frequencies, we first define a novel acoustic field prediction (AFP) task that predicts the local sound pressure field. We then train frequency-specific AFP models in simulation and measure the prediction errors on collected real data. We propose a frequency-adaptive strategy that intelligently selects the best frequency band for prediction based on both the measured prior and the energy distribution of the received audio, which improves the generalization on real data. Coupled with waypoint navigation, we show the navigation policy not only improves navigation performance in simulation but also transfers successfully to real robots. This work demonstrates the potential of building autonomous agents that can see, hear, and act entirely from simulation, and transferring them to the real world.

Index terms

Transfer Learning Robot Audition Audio-Visual SLAM