Probability-Driven Gating for Resilient Multi-Modal Tracking in Robotic Systems
Huan Wang, Haomin Chen, Pengcheng Du, Pengju Si, Baofeng Ji, Yongming Yang
AI summary
Problem
Current RGB-thermal tracking methods rely on rigid, non-adaptive fusion strategies that fail under dynamic scene changes or partial sensor degradation, compromising reliability for robotic applications.
Approach
The authors introduce DSTTrack, which uses a probability-gated switch to dynamically select optimal fusion paths in real-time, paired with a multi-domain enhancement network that refines features across channel, spatial, and frequency domains while establishing cross-modal compensatory links.
Key results
- State-of-the-art accuracy on RGBT234 (89.3% PR) and VTUAV (85.5% PR) benchmarks
- Real-time adaptive fusion via probability-driven dynamic gating
- Inherent sensor fault tolerance and consistent tracking under partial degradation
- Validated multi-domain enhancement modules (channel, spatial, frequency) and cross-modal attention bridges
Why it matters
Provides a robust, adaptive perception backbone essential for reliable robotic navigation and manipulation in unpredictable, real-world environments.
Abstract
The deployment of robots in unstructured environ- ments demands perception systems that are both accurate and resilient. While RGB-Thermal (RGB-T) fusion is promising, current trackers often fail due to rigid, non-adaptive fusion strategies and underutilized cross-modal cues, compromising reliability for robotics. We introduce DSTrack, a novel tracking framework that embeds two core mechanisms for robotic ro- bustness: a Probability-Gated Dynamic Switch and a Synergis- tic Multi-Domain Enhancement Network. The switch acts as an online decision-maker, allowing the robot to dynamically select the most reliable fusion path based on real-time confidence estimation, enabling crucial adaptation to scene changes. The enhancement network concurrently strengthens target repre- sentations within each modality through tri-domain (channel, spatial, frequency) refinement and establishes compensatory links between modalities via a cross-attention module, ensuring performance even during partial sensor degradation. Extensive evaluations on RGB-T benchmarks demonstrate state-of-the- art accuracy. More critically, DSTrack exhibits key properties for robotic integration: real-time environmental adaptability, inherent sensor fault tolerance, and consistent output for downstream planning.