← Back ICRA 2026

Agile Collision Avoidance for Deformable-Tethered Multi-Robot Systems Via Zone-Aware Hierarchical Learning and VLM-Guided Control

Zeyu Zhou, Jingwei Zhang, Hui Zhi, Yun Hao, Wei Tang, David Navarro-Alarcon

PDF

AI summary

Key figure (auto-extracted from paper)

H-SEPID achieves 94% success in navigating tethered multi-robot systems through dynamic crowds by combining hierarchical reinforcement learning with VLM-guided strategic intent, enabling emergent gap-threading maneuvers.

Multi-robot systems Deformable tethers Hierarchical reinforcement learning Vision-language models Collision avoidance Gap-threading

Problem

Navigating tethered multi-robot systems in dynamic environments is hindered by uncontrollable flexible hoses that create complex, varying collision footprints, which classical planners and flat learning methods fail to model effectively.

Approach

The H-SEPID framework integrates zone-aware hierarchical reinforcement learning with a Vision-Language Model to infer strategic intent and constrain actions, using a dual-attention value network for smooth policy switching and a safety shield to enable emergent gap-threading.

Key results

94% success rate and 4% collision rate in 8-robot, 5-pedestrian simulations
Outperforms five classical and learning-based baselines by up to 28% in success rate
Generates emergent gap-threading strategies (aggressive, thread-the-needle, conservative) without explicit programming
Achieves 95% success in real-world e-puck2 deployments with <5% performance drop from simulation

Why it matters

Provides a scalable, safe navigation framework for tethered multi-robot teams, critical for applications like collaborative material transport and dynamic multi-agent coordination.

Abstract

Navigating Linked Multi-Component Robotic Sys- tems (L-MCRS)—robot pairs tethered by passive flexible hoses—through dynamic pedestrian environments is funda- mentally harder than rigid multi-robot coordination, as the uncontrollable hose creates a variable-geometry collision foot- print spanning 118 pairwise combinations. We propose H- SEPID, unifying zone-aware Hierarchical Reinforcement Learn- ing grounded in Kinematic Flow Theory with VLM-guided cascaded optimization. A phase-aware dual attention value network performs C0-continuous topological policy switching, while a Vision-Language Model infers strategic intent and quantifies action-space constraints governing hose geometry. A seven-category safety shield with ORCA fallback and a threading reward band produce emergent gap-threading ma- neuvers. H-SEPID achieves 94% success and 4% collision rate in an 8-robot, 5-pedestrian, 4-hose scenario, outperforming five baselines, and is validated on real e-puck2 robots across 12 configurations.

Index terms

Reinforcement Learning Flexible Robotics Hybrid Logical/Dynamical Planning and Verification