LeGO-MM: Learning Navigation for Goal-Oriented Mobile Manipulation Via Hierarchical Policy Distillation
Bolei Chen, Liangbai Liu, Shengsheng Yan, Haonan Yang, Ping Zhong, Jianxin Wang
AI summary
Problem
Mobile manipulation requires learning multi-stage, heterogeneous behaviors, but existing RL agents suffer from sample inefficiency, progress reversal, and poor generalization to new tasks due to information asymmetry and rigid training setups.
Approach
The authors propose Hierarchical Policy Distillation (HPD), which uses Sub-Skill Distillation (SSD) to learn main tasks and relevant sub-skills concurrently in a single loop, and Self-boosting Policy Distillation (SPD) to transfer experience from prior tasks to new ones without performance degradation.
Key results
- Mitigates progress reversal via concurrent sub-skill distillation
- Resolves information asymmetry through prior-to-new task transfer
- Outperforms strong RL baselines across simulation platforms
- Validates successful sim-to-real deployment on physical robots
Why it matters
Enables sample-efficient, generalizable learning for complex mobile manipulation tasks, advancing autonomous robots toward practical everyday assistance.
Abstract
Benefiting from mobility and dexterity, Mobile Manipulation (MM) systems are expected to assist humans with diverse tasks in everyday life. However, since MM tasks (e.g., tidying up a room) require learning multi-stage het- erogeneous behaviors (e.g., picking, placing, and opening), existing Reinforcement Learning (RL) agents often face sample inefficiency and progress reversal issues. In addition, such MM agents are limited to learning customized tasks, thus not allowing for the extrapolation to new tasks and real- world scenes. In this work, we propose a Hierarchical Policy Distillation (HPD)-based RL framework to explicitly address these issues, which outperforms existing curriculum learning- based and hierarchical RL-based methods. Specifically, Sub- Skill Distillation (SSD) allows learning both the main MM task and easier sub-skills in a single training loop, facilitating exploration and mitigating process reversal by distilling the relevant sub-skills’ experience into the main task. Self-boosting Policy Distillation (SPD) is designed to enhance generalization and address the information asymmetry between MM tasks in a principled way, i.e., distilling the experience of a prior task to a new one. Comparative and ablation studies on different robotic platforms demonstrate that our method significantly outperforms existing methods. Finally, real-world experiments validate the practicality of our method.