← Back ICRA 2026

Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-Language Models

Mingen Li, Houjian Yu, Yixuan Huang, Youngjin Hong, Hantao Ye, Changhyun Choi

PDF

AI summary

Key figure (auto-extracted from paper)

A hierarchical framework combining in-context vision-language planning with reinforcement learning achieves 92% success in long-horizon deformable cable routing with robust failure recovery.

Deformable object manipulation long-horizon routing vision-language models reinforcement learning failure recovery hierarchical planning

Problem

Long-horizon routing of deformable linear objects like cables requires precise multi-step planning and reliable low-level control, but existing methods struggle with generalization to complex scenes and lack autonomous failure recovery.

Approach

The framework uses a vision-language model for in-context high-level reasoning and failure detection, while low-level reinforcement learning policies execute safe insertion, pulling, and flattening skills to navigate clips.

Key results

92% overall success rate across long-horizon routing scenarios
Generalization from 3-clip to 4- and 5-clip multi-clip settings
VLM-triggered failure recovery reorients stuck cables to resume routing
RL insertion policy improves success rate from 45% to 87% over heuristic baselines

Why it matters

Enables reliable, autonomous cable and wire management in cluttered industrial and domestic environments, advancing long-horizon deformable manipulation for real-world robotics.

Abstract

Long-horizon routing tasks of deformable linear objects (DLOs), such as cables and ropes, are common in industrial assembly lines and everyday life. These tasks are par- ticularly challenging because they require robots to manipulate DLO with long-horizon planning and reliable skill execution. Successfully completing such tasks demands adapting to their nonlinear dynamics, decomposing abstract routing goals, and generating multi-step plans composed of multiple skills, all of which require accurate high-level reasoning during execution. In this paper, we propose a fully autonomous hierarchical framework for solving challenging DLO routing tasks. Given an implicit or explicit routing goal expressed in language, our framework leverages vision-language models (VLMs) for in- context high-level reasoning to synthesize feasible plans, which are then executed by low-level skills trained via reinforcement learning. To improve robustness over long horizons, we further introduce a failure recovery mechanism that reorients the DLO into insertion-feasible states. Our approach generalizes to diverse scenes involving object attributes, spatial descriptions, implicit language commands, and extended 5-clip settings. It achieves an overall success rate of 92% across long-horizon routing scenarios. Please refer to our project page: https: //icra2026-dloroute.github.io/DLORoute/

Index terms

Deep Learning in Grasping and Manipulation Reinforcement Learning