Agile Collision Avoidance for Deformable-Tethered Multi-Robot Systems Via Zone-Aware Hierarchical Learning and VLM-Guided Control
Zeyu Zhou, Jingwei Zhang, Hui Zhi, Yun Hao, Wei Tang, David Navarro-Alarcon
AI summary
Problem
Navigating tethered multi-robot systems in dynamic environments is hindered by uncontrollable flexible hoses that create complex, varying collision footprints, which classical planners and flat learning methods fail to model effectively.
Approach
The H-SEPID framework integrates zone-aware hierarchical reinforcement learning with a Vision-Language Model to infer strategic intent and constrain actions, using a dual-attention value network for smooth policy switching and a safety shield to enable emergent gap-threading.
Key results
- 94% success rate and 4% collision rate in 8-robot, 5-pedestrian simulations
- Outperforms five classical and learning-based baselines by up to 28% in success rate
- Generates emergent gap-threading strategies (aggressive, thread-the-needle, conservative) without explicit programming
- Achieves 95% success in real-world e-puck2 deployments with <5% performance drop from simulation
Why it matters
Provides a scalable, safe navigation framework for tethered multi-robot teams, critical for applications like collaborative material transport and dynamic multi-agent coordination.
Abstract
Navigating Linked Multi-Component Robotic Sys- tems (L-MCRS)—robot pairs tethered by passive flexible hoses—through dynamic pedestrian environments is funda- mentally harder than rigid multi-robot coordination, as the uncontrollable hose creates a variable-geometry collision foot- print spanning 118 pairwise combinations. We propose H- SEPID, unifying zone-aware Hierarchical Reinforcement Learn- ing grounded in Kinematic Flow Theory with VLM-guided cascaded optimization. A phase-aware dual attention value network performs C0-continuous topological policy switching, while a Vision-Language Model infers strategic intent and quantifies action-space constraints governing hose geometry. A seven-category safety shield with ORCA fallback and a threading reward band produce emergent gap-threading ma- neuvers. H-SEPID achieves 94% success and 4% collision rate in an 8-robot, 5-pedestrian, 4-hose scenario, outperforming five baselines, and is validated on real e-puck2 robots across 12 configurations.