Rapid Robot Manipulation Policy Learning Via Hierarchical Foundation-Model Prior Distillation
Qingwei Dong, Jiyuan Zhang, Guangxi Wan, ruikai liu, Peng Zeng
AI summary
Problem
Rapid policy learning in robotics struggles with inefficient early exploration and high computational costs when adapting foundation models or collecting expert demonstrations.
Approach
The method uses LLM-generated prompts to extract diverse trajectory priors from OpenVLA, which initialize and constrain a lightweight local controller refined through reinforcement learning.
Key results
- Prompt variations successfully elicit task-relevant trajectories from OpenVLA
- Hierarchical prior distillation reduces adaptation costs and improves exploration efficiency
- Faster convergence and higher final performance than LGC/GPS baselines across LIBERO tasks
- Validates effective integration of foundation model priors with local dynamics fitting
Why it matters
Enables faster, sample-efficient robotic skill acquisition for new tasks without expensive full model fine-tuning or expert data collection.
Abstract
In robotic skill acquisition, rapid policy learning remains challenging due to high-dimensional state-action spaces and inefficient exploration in the early stage of training [1]. Although the pre-trained OpenVLA model exhibits cross-task generalization and can generate goal-directed actions for unseen tasks under suitable prompts, its direct application to novel manipulation tasks remains limited, while full fine-tuning is computationally expensive. To address this issue, we propose a hierarchical framework that combines OpenVLA with re- inforcement learning for efficient skill acquisition. Specifically, OpenVLA is used to generate diverse task-related prior trajec- tories through prompt engineering, and reinforcement learning leverages these priors to fit local dynamics and constrain policy exploration. In this way, the proposed method improves adaptation efficiency and accelerates policy learning on new tasks. We evaluate the framework on multiple manipulation tasks in the LIBERO environment.