← Back ICRA 2026

LeGO-MM: Learning Navigation for Goal-Oriented Mobile Manipulation Via Hierarchical Policy Distillation

Bolei Chen, Liangbai Liu, Shengsheng Yan, Haonan Yang, Ping Zhong, Jianxin Wang

PDF

AI summary

Key figure (auto-extracted from paper)

A hierarchical policy distillation framework that simultaneously learns main tasks and sub-skills to solve sample inefficiency, progress reversal, and poor generalization in mobile manipulation.

Mobile Manipulation Reinforcement Learning Policy Distillation Hierarchical RL Sim-to-Real Transfer Goal-Oriented Navigation

Problem

Mobile manipulation requires learning multi-stage, heterogeneous behaviors, but existing RL agents suffer from sample inefficiency, progress reversal, and poor generalization to new tasks due to information asymmetry and rigid training setups.

Approach

The authors propose Hierarchical Policy Distillation (HPD), which uses Sub-Skill Distillation (SSD) to learn main tasks and relevant sub-skills concurrently in a single loop, and Self-boosting Policy Distillation (SPD) to transfer experience from prior tasks to new ones without performance degradation.

Key results

Mitigates progress reversal via concurrent sub-skill distillation
Resolves information asymmetry through prior-to-new task transfer
Outperforms strong RL baselines across simulation platforms
Validates successful sim-to-real deployment on physical robots

Why it matters

Enables sample-efficient, generalizable learning for complex mobile manipulation tasks, advancing autonomous robots toward practical everyday assistance.

Abstract

Benefiting from mobility and dexterity, Mobile Manipulation (MM) systems are expected to assist humans with diverse tasks in everyday life. However, since MM tasks (e.g., tidying up a room) require learning multi-stage het- erogeneous behaviors (e.g., picking, placing, and opening), existing Reinforcement Learning (RL) agents often face sample inefficiency and progress reversal issues. In addition, such MM agents are limited to learning customized tasks, thus not allowing for the extrapolation to new tasks and real- world scenes. In this work, we propose a Hierarchical Policy Distillation (HPD)-based RL framework to explicitly address these issues, which outperforms existing curriculum learning- based and hierarchical RL-based methods. Specifically, Sub- Skill Distillation (SSD) allows learning both the main MM task and easier sub-skills in a single training loop, facilitating exploration and mitigating process reversal by distilling the relevant sub-skills’ experience into the main task. Self-boosting Policy Distillation (SPD) is designed to enhance generalization and address the information asymmetry between MM tasks in a principled way, i.e., distilling the experience of a prior task to a new one. Comparative and ablation studies on different robotic platforms demonstrate that our method significantly outperforms existing methods. Finally, real-world experiments validate the practicality of our method.

Index terms

Embodied Cognitive Science Mobile Manipulation Reinforcement Learning