Research Analyzer
← Back ICRA 2026

Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Senwei Xie, Yuntian Zhang, Ruiping Wang, Xilin Chen

PDF

AI summary

Key figure (auto-extracted from paper)
Uni-Skill enables robots to autonomously expand their skill libraries from unstructured videos, achieving state-of-the-art zero-shot generalization on novel manipulation tasks.
Robotic manipulation Skill-centric planning Self-evolving skill library Unstructured video retrieval Zero-shot generalization Vision-language models

Problem

Existing skill-centric robotic approaches rely on fixed, manually curated skill libraries, limiting adaptability to new tasks and requiring heavy human supervision for skill acquisition.

Approach

The framework uses a vision-language model to detect missing skills during planning and automatically retrieves demonstrations from SkillFolder, a hierarchically structured repository of unstructured robotic videos, to implement them via few-shot inference.

Key results

  • Hierarchical SkillFolder repository with 10k+ auto-annotated video traces
  • Dynamic skill-aware planning for missing skill detection and generation
  • Automatic skill evolution via semantic retrieval and few-shot implementation
  • State-of-the-art zero-shot success, outperforming MOKA by 31.0% on novel tasks

Why it matters

Enables scalable, low-cost robotic skill acquisition and generalization to novel tasks without manual data collection or deployment-time supervision.

Abstract

While skill-centric approaches leverage foundation models to enhance generalization in compositional tasks, they often rely on fixed skill libraries, limiting adaptability to new tasks without manual intervention. To address this, we propose Uni-Skill, a Unified Skill-centric framework that supports skill- aware planning and facilitates automatic skill evolution. Unlike prior methods that restrict planning to predefined skills, Uni- Skill requests for new skill implementations when existing ones are insufficient, ensuring adaptable planning with self- augmented skill library. To support automatic implementation of diverse skills requested by the planning module, we construct SkillFolder, a VerbNet-inspired repository derived from large- scale unstructured robotic videos. SkillFolder introduces a hier- archical skill taxonomy that captures diverse skill descriptions at multiple levels of abstraction. By populating this taxonomy with large-scale, automatically annotated demonstrations, Uni- Skill shifts the paradigm of skill acquisition from inefficient manual annotation to efficient offline structural retrieval. Re- trieved examples provide semantic supervision over behavior patterns and fine-grained references for spatial trajectories, en- abling few-shot skill inference without deployment-time demon- strations. Comprehensive experiments in both simulation and real-world settings verify the state-of-the-art performance of Uni-Skill over existing VLM-based skill-centric approaches, highlighting its advanced reasoning capabilities and strong zero- shot generalization across a wide range of novel tasks.

Index terms

Learning from Demonstration Integrated Planning and Learning Manipulation Planning

Related papers