← Back SII 2026

Making Objects Speak: Spatial Audio Guidance for Object Grasping by Blind and Visually Impaired Users

CHENXIN QIN, Yukiko Iwasaki, CHENYANG LI, Hiroyasu Iwata

PDF

Abstract

This paper presents an assistive system that enables blind and visually impaired (BVI) users to localize and reach objects using spatialized audio cues, rendered as if the objects themselves emit sound. By integrating voice command recognition, RGB-D-based 3D localization, and head-tracked spatial audio via Apple AirPods Pro, the system transforms object positions into egocentric, directional prompts aligned with the user’s head orientation. We evaluated the system through tabletop localization-to-contact tasks with blindfolded sighted participants, comparing a spatial-audio (SA) condition against a speech-only (SO) baseline. While success rates were comparable between conditions, spatial audio significantly reduced task completion time and subjective workload and received substantially higher usability ratings. These findings suggest that spatialized object-originating sound can enhance task efficiency and user experience in near-field, non-visual interaction scenarios.

Index terms

Human-robot Interaction / Collaboration Assistive Robotics