← Back ICRA 2026

Manifold Geometry-Based Feature Decoupling for Endoscopic Image Analysis

Yan Wen, Haodong Wang, Lingyu Chen, Wenbo She, Dingpei Han, Fang Chen, Tianqi Huang,∗

PDF

AI summary

Key figure (auto-extracted from paper)

Decoupling endoscopic features into independent semantic subspaces via manifold geometry significantly improves depth estimation and segmentation accuracy with minimal parameter overhead.

Endoscopic vision Manifold geometry Feature decoupling Depth estimation Semantic segmentation Surgical robotics

Problem

Endoscopic images suffer from ambiguous semantic boundaries and weak feature discriminability due to complex anatomical structures and optical limitations, which traditional Euclidean-based models struggle to represent effectively.

Approach

The proposed SFDP framework uses a Manifold Geometry-Based Feature Decoupling Module (MANDE) to project features into a manifold space, decompose them into semantically independent sub-features using geometric constraints, and process them in parallel before fusing the results.

Key results

Reduces depth estimation RMSE by 14.2% on average
Decreases segmentation MAE by 10.5%
Adds only 5.10M additional parameters
Demonstrates consistent performance gains across multiple backbone networks

Why it matters

Provides a parameter-efficient, geometrically grounded paradigm to enhance visual perception for endoscopic surgery and surgical robotics.

Abstract

Endoscopic images suffer from ambiguous seman- tic boundaries and weak feature discriminability due to acqui- sition limitations and structural constraints of the anatomical lumen, severely limiting the performance of image analysis models. Mainstream models operate in Euclidean space, which is inherently limited in representing the non-linear geometric characteristics of endoscopic imagery. Manifold space provides a natural advantage in representing such complex structures. Inspired by the manifold hypothesis, this paper models en- doscopic image features as separable semantic subspaces on a manifold and proposes a Manifold Geometry-Based Feature De- coupling Module (MANDE). Guided by novel manifold geomet- ric constraints, MANDE adaptively decouples feature maps into multiple semantically independent sub-feature maps, effectively mitigating performance degradation caused by feature space coupling. Furthermore, this paper introduces the Semantic Fea- ture Decoupling-and-Processing (SFDP) framework adopting a divide-and-conquer strategy: utilizing a backbone network for feature extraction, MANDE for decoupling, and a Decouple- Aggregation Head for parallel processing and fusion of sub- features. Extensive experiments demonstrate the framework’s adaptability and effectiveness. When integrated with various popular networks, SFDP significantly enhances performance on endoscopic tasks: it reduces RMSE by an average of 14.2% for depth estimation and decreases MAE by 10.5% for segmentation, with only 5.10M additional parameters. Unlike prior works, SFDP uniquely integrates manifold geometry with semantic hierarchical modeling for endoscopic images, provid- ing a novel perspective for surgical robot scene understanding: from holistic features to semantic structures.

Index terms

Deep Learning for Visual Perception Computer Vision for Medical Robotics Visual Learning