Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-Language Models
Mingen Li, Houjian Yu, Yixuan Huang, Youngjin Hong, Hantao Ye, Changhyun Choi
AI summary
Problem
Long-horizon routing of deformable linear objects like cables requires precise multi-step planning and reliable low-level control, but existing methods struggle with generalization to complex scenes and lack autonomous failure recovery.
Approach
The framework uses a vision-language model for in-context high-level reasoning and failure detection, while low-level reinforcement learning policies execute safe insertion, pulling, and flattening skills to navigate clips.
Key results
- 92% overall success rate across long-horizon routing scenarios
- Generalization from 3-clip to 4- and 5-clip multi-clip settings
- VLM-triggered failure recovery reorients stuck cables to resume routing
- RL insertion policy improves success rate from 45% to 87% over heuristic baselines
Why it matters
Enables reliable, autonomous cable and wire management in cluttered industrial and domestic environments, advancing long-horizon deformable manipulation for real-world robotics.
Abstract
Long-horizon routing tasks of deformable linear objects (DLOs), such as cables and ropes, are common in industrial assembly lines and everyday life. These tasks are par- ticularly challenging because they require robots to manipulate DLO with long-horizon planning and reliable skill execution. Successfully completing such tasks demands adapting to their nonlinear dynamics, decomposing abstract routing goals, and generating multi-step plans composed of multiple skills, all of which require accurate high-level reasoning during execution. In this paper, we propose a fully autonomous hierarchical framework for solving challenging DLO routing tasks. Given an implicit or explicit routing goal expressed in language, our framework leverages vision-language models (VLMs) for in- context high-level reasoning to synthesize feasible plans, which are then executed by low-level skills trained via reinforcement learning. To improve robustness over long horizons, we further introduce a failure recovery mechanism that reorients the DLO into insertion-feasible states. Our approach generalizes to diverse scenes involving object attributes, spatial descriptions, implicit language commands, and extended 5-clip settings. It achieves an overall success rate of 92% across long-horizon routing scenarios. Please refer to our project page: https: //icra2026-dloroute.github.io/DLORoute/