← Back ICRA 2026

Visual-Auditory Proprioception of Soft Finger Shape and Contact

Qinsong Guo, Ke Yang, Hanwen Zhao, Haohan Fang, Haoxuan Wang, Chen Feng

PDF

AI summary

Key figure (auto-extracted from paper)

DeepCoFi accurately reconstructs soft finger geometry and contact by fusing internal camera images with acoustic spectrograms, overcoming severe occlusion.

Soft robotics Multimodal proprioception Acoustic sensing Shape reconstruction Contact estimation Deep learning

Problem

Vision-based proprioception fails under severe occlusion, while audio-only methods lack spatial detail, creating a gap in reliable real-time sensing for highly deformable soft fingers.

Approach

The system fuses internal visual and acoustic signals through a two-stage FoldingNet decoder that first estimates global bending and then refines localized contact deformations.

Key results

Multimodal fusion overcomes vision occlusion and audio low resolution
Two-stage FoldingNet enables joint bending and contact reconstruction
Novel co-molded hardware integrates camera, microphone, and exoskeleton
Robust generalization to unseen deformations and real-world grasping

Why it matters

Enables reliable, real-time proprioception critical for dexterous manipulation and safe human-robot interaction.

Abstract

Soft robotic fingers require precise proprioception of both global deformation and local contact to enable safe and dexterous manipulation. Vision-based methods can reconstruct overall shape but struggle under severe occlusion, while audio- only approaches provide complementary cues but lack spatial detail. We present DeepCoFi, a lightweight multimodal pro- prioception framework that fuses internal camera images with acoustic spectrograms to jointly recover finger geometry and contact. The framework leverages the complementary strengths of vision and acoustics and employs a FoldingNet-based two- stage decoder that first reconstructs global bending and then re- fines local contact deformations. To support this integration, we introduce a soft finger design that incorporates an exoskeleton- mounted camera and microphone in a single molding step, preserving compliance while enabling multimodal sensing. Ex- periments on a comprehensive dataset and real-world grasping tasks show that DeepCoFi achieves robust proprioception under occlusion and generalizes effectively to unseen deformations and contact conditions. Open-source resources and project updates are available at ai4ce.github.io/DeepCoFi.

Index terms

Modeling Control and Learning for Soft Robots Soft Sensors and Actuators Deep Learning for Visual Perception