← Back ICRA 2026

Rapid Robot Manipulation Policy Learning Via Hierarchical Foundation-Model Prior Distillation

Qingwei Dong, Jiyuan Zhang, Guangxi Wan, ruikai liu, Peng Zeng

PDF

AI summary

Key figure (auto-extracted from paper)

Combining OpenVLA trajectory priors with reinforcement learning accelerates robotic policy learning and outperforms baselines in sample efficiency and performance.

robot manipulation foundation models prior distillation reinforcement learning policy learning LIBERO

Problem

Rapid policy learning in robotics struggles with inefficient early exploration and high computational costs when adapting foundation models or collecting expert demonstrations.

Approach

The method uses LLM-generated prompts to extract diverse trajectory priors from OpenVLA, which initialize and constrain a lightweight local controller refined through reinforcement learning.

Key results

Prompt variations successfully elicit task-relevant trajectories from OpenVLA
Hierarchical prior distillation reduces adaptation costs and improves exploration efficiency
Faster convergence and higher final performance than LGC/GPS baselines across LIBERO tasks
Validates effective integration of foundation model priors with local dynamics fitting

Why it matters

Enables faster, sample-efficient robotic skill acquisition for new tasks without expensive full model fine-tuning or expert data collection.

Abstract

In robotic skill acquisition, rapid policy learning remains challenging due to high-dimensional state-action spaces and inefficient exploration in the early stage of training [1]. Although the pre-trained OpenVLA model exhibits cross-task generalization and can generate goal-directed actions for unseen tasks under suitable prompts, its direct application to novel manipulation tasks remains limited, while full fine-tuning is computationally expensive. To address this issue, we propose a hierarchical framework that combines OpenVLA with re- inforcement learning for efficient skill acquisition. Specifically, OpenVLA is used to generate diverse task-related prior trajec- tories through prompt engineering, and reinforcement learning leverages these priors to fit local dynamics and constrain policy exploration. In this way, the proposed method improves adaptation efficiency and accelerates policy learning on new tasks. We evaluate the framework on multiple manipulation tasks in the LIBERO environment.

Index terms

Reinforcement Learning Deep Learning Methods Learning from Experience