Research Analyzer
← Back ICRA 2026

Contrastive Auditory Knowledge Transfer for Tool-Mediated Robot Interaction with Granular Objects

Si Liu, Jindan Huang, Zhengyan Huan, Michael Hughes, Jivko Sinapov

PDF

AI summary

Key figure (auto-extracted from paper)
Contrastive learning on tool-mediated audio enables robots to recognize objects across new tools and behaviors, even achieving zero-shot recognition of novel objects without costly data collection.
Auditory perception Knowledge transfer Contrastive learning Tool-mediated interaction Zero-shot recognition Granular objects

Problem

Transferring audio-based object recognition knowledge across different robotic tools and interaction behaviors traditionally requires costly, extensive data collection for each new context.

Approach

The authors project tool-mediated audio into a shared latent space using two contrastive strategies: a supervised method leveraging shared objects and a zero-shot method aligning audio with natural language context descriptions.

Key results

  • Latent embeddings cluster by object identity independent of tool or behavior variations
  • Transfer models match or exceed supervised baselines despite limited target-context data
  • Zero-shot method successfully recognizes entirely novel objects via audio-text alignment
  • Effective cross-tool and cross-behavior knowledge transfer demonstrated on real-world granular object data

Why it matters

Offers a scalable, data-efficient perception framework for robots operating in dynamic, real-world environments where collecting context-specific data is impractical.

Abstract

Tool-mediated interactions enable robotics to ma- nipulate and explore granular objects, producing informative auditory signals. A central challenge is transferring this per- ceptual knowledge across different tools and behaviors without costly data collection for each new context. We address this problem in the domain of audio-based recognition of granular and liquid-like objects. In this work, we leverage audio signals from tool-mediated interactions and learn context-agnostic rep- resentations for object recognition. We propose two contrastive learning approaches: a shared-object transfer method that per- forms supervised contrastive learning using audio data, and a zero-shot transfer method that integrates both audio and natural language descriptions of interaction contexts. Experiments on real-world data show that both methods achieve strong object recognition performance in unseen contexts, sometimes match- ing or exceeding a supervised baseline despite limited target- context data. Furthermore, the learned latent spaces exhibit clearly separable clusters by object identity, and the zero- shot method successfully recognizes novel objects, offering a practical solution for robot perception in data-scarce scenarios. The code for this paper is available at: https://github. com/siliu6487/AuditoryKnowledgeTransfer.

Index terms

Transfer Learning Robot Audition Representation Learning

Related papers