← Back ICRA 2026

TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within a Joint Learning Framework

Guanfeng Tang, Zhiyuan Wu, Jiahang Li, Ping Zhong, Xieyuanli Chen, Huimin Lu, Rui Fan

PDF

AI summary

Key figure (auto-extracted from paper)

TiCoSS improves semantic segmentation accuracy by over 9% by tightly coupling it with stereo matching in a unified joint learning framework.

Semantic segmentation stereo matching joint learning feature fusion autonomous driving computer vision

Problem

Existing joint learning frameworks for semantic segmentation and stereo matching suffer from loose feature coupling and independent loss computation, limiting their ability to effectively share contextual and geometric information.

Approach

The proposed TiCoSS framework employs a gated feature fusion strategy to selectively merge RGB and disparity features, hierarchical deep supervision for stable training, and a specialized loss function to enforce task complementarity.

Key results

Tightly-coupled gated feature fusion strategy
Hierarchical deep supervision strategy
Coupling tightening loss function
Over 9% mean IoU improvement on benchmark datasets

Why it matters

This approach advances autonomous driving perception by enabling more accurate and robust simultaneous scene understanding and depth estimation for robotics and computer vision applications.

Abstract

Semantic segmentation and stereo matching, respec- tively analogous to the ventral and dorsal streams in our human brain, are two key components of autonomous driving perception systems. Addressing these two tasks with separate networks is no longer the mainstream direction in developing computer vision algorithms, particularly with the recent advances in large vision models and embodied artificial intelligence. The trend is shifting towards combining them within a joint learning framework, especially emphasizing feature sharing between the two tasks. The major contributions of this study lie in compre- hensively tightening the coupling between semantic segmentation and stereo matching. Specifically, this study makes three key contributions: (1) a tightly coupled, gated feature fusion strategy, (2) a hierarchical deep supervision strategy, and (3) a coupling tightening loss function. The combined use of these technical contributions results in TiCoSS, a state-of-the-art joint learning framework that simultaneously tackles semantic segmentation and stereo matching. Through extensive experiments on the Received 10 February 2025; revised 11 May 2025; accepted 19 June 2025. Date of publication 7 July 2025; date of current version 25 July 2025. This article was recommended for publication by Associate Editor W. Zhang and Editor V. Villani upon evaluation of the reviewers’ comments. This work was supported in part by the National Natural Science Foundation of China under Grant 62473288, Grant 62233013, Grant 62403478, Grant 62176184, and Grant 62272489; in part by the Fundamental Research Funds for the Central Universities; in part by the NIO University Programme (NIO UP); in part by the Xiaomi Young Talents Program; and in part by the Young Elite Scientists Sponsorship Program by CAST under Grant 2023QNRC001. (Corresponding author: Rui Fan.) Guanfeng Tang and Jiahang Li are with the College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China (e-mail: gftang@tongji.edu.cn; lijiahang617@tongji.edu.cn). Zhiyuan Wu is with the Department of Engineering, King’s College London, WC2R 2LS London, U.K. (e-mail: zhiyuan.1.wu@kcl.ac.uk). Ping Zhong is with the Department of Computer Science and Tech- nology, Central South University, Changsha 410017, China (e-mail: ping.zhong@csu.edu.cn). Wei Ye is with the College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China, and also with Shanghai Innovation Institute, Shanghai 200231, China (e-mail: yew@tongji.edu.cn). Xieyuanli Chen and Huimin Lu are with the College of Intelligence Sci- ence and Technology, National University of Defense Technology, Changsha 410082, China (e-mail: chenxieyuanli@hotmail.com; lhmnew@nudt.edu.cn). Rui Fan is with the College of Electronics and Information Engineering, Shanghai Institute of Intelligent Science and Technology, Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai Key Laboratory of Intelligent Autonomous Systems, State Key Laboratory of Autonomous Intelligent Unmanned Systems, and Frontiers Science Center for Intelligent Autonomous Systems of the Ministry of Education, Tongji University, Shang- hai 201804, China (e-mail: rui.fan@ieee.org). This article has supplementary downloadable material available at https://doi.org/10.1109/TASE.2025.3586286, provided by the authors. Digital Object Identifier 10.1109/TASE.2025.3586286 KITTI, vKITTI2, and Cityscapes datasets, along with both qualitative and quantitative analyses, we validate the effectiveness of our developed strategies and loss function. Our approach demonstrates superior performance compared to prior arts, with a notable increase in mean intersection over union by over 9%. Note to Practitioners—TiCoSS is a robust and effective joint learning framework that can simultaneously tackle semantic segmentation and stereo matching tasks. This work aims to improve semantic segmentation performance by exploring the potential complementarity and tightening the coupling between these two tasks. In the future, we plan to further improve the efficiency of the framework, so as to enable its real-time performance on resource-constrained hardware.

Index terms

Deep Learning for Visual Perception RGB-D Perception