RoboMatch: A Unified Mobile-Manipulation Teleoperation Platform with Auto-Matching Network Architecture for Long-Horizon Tasks
Hanyu Liu, Yunsheng Ma, Jiaxin Huang, Keqiang Ren, Jiayi Wen, Yilin Zheng, Haoru Luan, Baishu Wan, Pan Li, Jiejun Hou, Zhihua Wang, Zhigong Song
AI summary
Problem
Current teleoperation platforms lack synchronized mobile-manipulation control and sufficient sensory feedback, while end-to-end models struggle with error accumulation and limited reasoning in long-horizon tasks.
Approach
The authors introduce a unified cockpit-style teleoperation platform enhanced with a Proprioceptive-Visual Enhanced Diffusion Policy for precise manipulation and an Auto-Matching Network that decomposes long-horizon tasks into subtasks routed to specialized lightweight models.
Key results
- 20% increase in data collection efficiency via unified cockpit interface
- PVE-DP improves task success rates by 20–30% through spatio-frequency visual fusion and IMU-enhanced proprioception
- AMN boosts long-horizon inference performance by ~40% via dynamic subtask routing to specialized policies
Why it matters
Offers a scalable, high-precision framework for complex mobile manipulation and long-horizon task execution, advancing real-world deployment of imitation learning and teleoperation systems.
Abstract
This paper presents RoboMatch, a novel uni- fied teleoperation platform for mobile manipulation with an auto-matching network architecture, designed to tackle long- horizon tasks in dynamic environments. Our system enhances teleoperation performance, data collection efficiency, task ac- curacy, and operational stability. The core of RoboMatch is a cockpit-style control interface that enables synchronous operation of the mobile base and dual arms, significantly improving control precision and data collection. Moreover, we introduce the Proprioceptive-Visual Enhanced Diffusion Policy (PVE-DP), which leverages Discrete Wavelet Trans- form (DWT) for multi-scale visual feature extraction and integrates high-precision IMUs at the end-effector to enrich proprioceptive feedback, substantially boosting fine manipula- tion performance. Furthermore, we propose an Auto-Matching Network (AMN) architecture that decomposes long-horizon tasks into logical sequences and dynamically assigns lightweight pre-trained models for distributed inference. Experimental results demonstrate that our approach improves data collec- tion efficiency by over 20%, increases task success rates by 20–30% with PVE-DP, and enhances long-horizon inference performance by approximately 40% with AMN, offering a robust solution for complex manipulation tasks. Project website: https://robomatch.github.io